期刊论文详细信息
BMC Medical Informatics and Decision Making
Semantic biomedical resource discovery: a Natural Language Processing framework
Manolis Tsiknakis2  Kostas Marias3  Norbert Graf1  Giorgos Zacharioudakis3  Galatia Iatraki3  Stelios Sfakianakis3  Lefteris Koumakis3  Pepi Sfakianaki3 
[1] Paediatric Haematology and Oncology, Saarland University Hospital, Homburg, Germany;Department of Informatics Engineering, Technological Educational Institute, Heraklion, Crete, Greece;Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete, Greeece
关键词: Natural language interface;    Search engine;    Biomedical informatics;    Text mining;    Information extraction;    Biomedical text annotation;    Resource discovery;    Natural language processing;    Semantic resource annotation;   
Others  :  1228382
DOI  :  10.1186/s12911-015-0200-4
 received in 2015-03-09, accepted in 2015-09-21,  发布年份 2015
【 摘 要 】

Background

A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain.

The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language.

Methods

A Natural Language Processing engine which can “translate” free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at (http://calchas.ics.forth.gr/).

Results

For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant.

Conclusions

There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.

【 授权许可】

   
2015 Sfakianaki et al.

附件列表
Files Size Format View
Fig. 2. 16KB Image download
Fig. 1. 65KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

【 参考文献 】
  • [1]Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A et al.. Biomedical text mining and its applications in cancer research. J Biomed Inform. 2013; 46:200-211.
  • [2]Meystre S, Haug JP. Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation. J Biomed Inform. 2006; 39(6):589-599.
  • [3]Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S et al.. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 2013; 41(W1):557-561.
  • [4]Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D et al.. myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res. 2010; 38(2):677-682.
  • [5]Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M et al.. BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res. 2010; 38(2):W689-W694.
  • [6]Li JW, Schmieder R, Ward M, Delenick J, Olivares EC, Mittelman D. SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics. 2012; 28(9):1272-1273.
  • [7]Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I et al.. The EMBRACE web service collection. Nucleic Acids Res. 2010; 38(2):683-688.
  • [8]Gentleman R, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al.. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004; 5(10):R80. BioMed Central Full Text
  • [9]National Library of Medicine. ORBIT: Online Registry of Biomedical Informatics Tools. [Internet]. 2011 [cited 2013].
  • [10]Simpson MS, Demner-Fushman D, Biomedical Text Mining: a survey of recent progress. In: Mining text data. Springer US; 2012. 465–517.
  • [11]Cao Y, Liu F, Simpson P, Antieau L, Bennettq A, Cimino JJ et al.. AskHERMES: An online question answering system for complex clinical questions. J Biomed Inform. 2011; 44(2):277-288.
  • [12]Cao Y, Cimino JJ, Ely J, Yu H. Automatically extracting information needs from complex clinical questions. J Biomed Inform. 2010; 43:962-971.
  • [13]Koumakis L, Moustakis V, Potamias G. Web Services Automation. Hershey Information Science Reference, New York; 2009.
  • [14]Friedman C, Rindflesch TC, Corn M. Natural Language Processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform. 2013; 46(5):765-773.
  • [15]Settles B. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics. 2005; 21(14):3191-3192.
  • [16]Cunningham H. GATE, a general architecture for text engineering. Comput Hum. 2002; 36(2):223-254.
  • [17]Ferucci D, Laily A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng. 2004; 10(3–4):327-348.
  • [18]Clement J, Nigam SH, Cherie YH, Musen MA, Callendar C, Storey MA. NCBO Annotator: Semantic Annotation of Biomedical Data. International Semantic Web Conference, Poster and Demo session. 2009.
  • [19]Belloze KT, Monteiro DISB, Lima TF, Silva-Jr FP, Cavalcanti MC. An Evaluation of Annotation Tools for Biomedical Texts. ONTOBRAS-MOST. 2012; 108–119.
  • [20]Wimalasuriya DC, Dejing D. Ontology-based information extraction: An introduction and a survey of current approaches. J Inf Sci. 2010; 36(3):306-323.
  • [21]Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004; 32(1):267-270.
  • [22]Al-Safadi L, Alomran R, Almutairi F. Evalutation of MetaMap performance in radiographic images retrieval. Res J Appl Sci Eng Technol. 2013; 22(6):4231-4236.
  • [23]Wu Y, Denny JC, Rosenbloom T, Miller RA, Giuse DA, Xu H. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. Am Med Inform Assoc. 2012; 2012:997.
  • [24]Sfakianaki P, Koumakis L, Sfakianakis S, Tsiknakis M. Natural language processing for biomedical tools discovery: A feasibility study and preliminary results. In: 17th International Conference on Business Information Systems; 2014; Larnaca, Cyprus
  • [25]P-Medicine EU project web site. [Internet]. 2012 [cited 2015 Mar 08]. Available from:. http://www. p-medicine.eu webcite
  • [26]Marias K, Dionysiou D, Sakkalis V, Graf N, Bohle RM, Coveney PV, et al.  Clinically driven design of multi-scale cancer models: the ContraCancrum project paradigm. Interface Focus. 2011;1(3):450–461
  • [27]Schulz M, Krause F, Le Novere N, Klipp E, Liebermeister W. Retrieval, alignment, and clustering of computational models based on semantic annotations. Mol Syst Biol. 2011; 7(1):512.
  • [28]Brown PF, de Souza PV, Mercer RL, Della Pietra VJ, Lai JC. Class-based n-gram models of natural language. Comput Linguist. 1992; 18(4):467-479.
  • [29]Kalas M, Puntervoll P, Joseph A, Bartaseviciute E, Topfer A, Venkataraman P et al.. BioXSD: the common data-exchange format for everyday bioinformatics web services. Oxf J: Bioinformatics. 2010; 26(18):540-546.
  • [30]Lamprecht AL, Margaria T, Steffen B. Bio-jETI: a framework for semantics-based service composition. BMC Bioinformatics. 2009; 10(10):S8. BioMed Central Full Text
  • [31]Smiley D, Pugh DE. Apache Solr 3 Enterprise Search Server. Packt Publishing Ltd; 2011.
  • [32]Black S. PostgreSQL: introduction and concepts. Linux J. 2001; 2001(88):16.
  • [33]Sfakianakis S, Graf N, Hoppe A, Rüping S, Wegener D, Koumakis L, et al. Building a System for Advancing Clinico-Genomic Trials on Cancer. George Potamias Vassilis Moustakis (eds.), 2009. 33.
  • [34]Stamatakos GS, Dionysiou D, Lunzer A, Belleman R, Kolokotroni E, Georgiadi E et al.. The technologically integrated oncosimulator: combining multiscale cancer modeling with information technology in the in silico oncology context. Biomed Health Informatics, IEEE. 2014; 18(3):840-854.
  • [35]Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. 2014.
  • [36]Hartung DM, Zarin DA, Guise IM, McDonagh M, Paynter R, Helfand M. Reporting discrepancies between the ClinicalTrials.gov results database and peer-reviewed publications. Ann Intern Med. 2014; 160(7):477-483.
  • [37]Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA et al.. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012; 2(5):401-404.
  • [38]National Cancer Institute SEER API. [Internet]. [cited 2014 Dec]. Available from:. http://www. programmableweb.com/api/national-cancer-institute-seer webcite
  • [39]EU-ADR Web Platform. [Internet]. [cited 2014 Dec]. Available from:. https://bioinformatics. ua.pt/euadr/Welcome.jsp webcite
  • [40]Powers D. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness & Correlation. J Mach Learn Technol. 2011; 2(1):37-63.
  • [41]2.0: investigating the combinatorial effect of microRNAs in pathways. Nucleic Acids Res. 2012; 40(W):498-504.
  • [42]Chiromatzo A, Oliveira T, Pereira G, Costa A, Montesco C, DE G et al.. miRNApath: a database of miRNAs, target genes and metabolic pathways. Genet Mol Res. 2007; 6(4):859-865.
  • [43]Sheng-Da H, Feng-Mao L, Wi-Yun W, Chao L, Wei-Chih H, Wen-Ling C, et al. miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic Acids Res. 2010;gkq1107.
  • [44]Koumakis L, Moustakis V, Zervakis M, Kafetzopoulos D, Potamias G. Coupling Regulatory Networks and Microarays: Revealing Molecular Regulations of Breast Cancer Treatment Responses. 2012.
  • [45]Meystre SM, Savova K, Kipper-Schuler C, Hurdle JF. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research. Yearb Med Inform. 2008; 35:128-144.
  • [46]Nadkarni M, Lucila OM, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011; 18(5):544-551.
  • [47]Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of Negation Phrases in Narrative Clinical Reports. Proceedings of the AMIA Symposium. American Medical Informatics Association. 2001 105–109.
  • [48]Kononenko O, Baysal O, Holmes R, Godfrey MW. Mining modern repositories with elastic search. In: ACM, eds. Proceedings of the 11th Working Conference on Mining Software Repositories; 2014. pp. 328-331.
  • [49]Potamias G, Koumakis L, Moustakis V. Enhancing web based services by coupling document classification with user profile. In: IEEE, eds. Computer as a Tool (EUROCON 2005); 2005. p. 205–208.
  • [50]Sfakianakis S, Koumakis L, Zacharioudakis G, Tsiknakis M. Web-based Authoring and Secure Enactment of Bioinformatics Workflows. Grid and Pervasive Computing Conference. IEEE, Geneva Switzerland; 2009.
  • [51]Tao Y, Kwei-Jay L. Service selection algorithms for Web services with end-to-end QoS constraints. Inf Syst E-Business Manag. 2005; 3(2):103-126.
  • [52]Kanterakis A, Potamias G, Zacharioudakis G, Koumakis L, Sfakianakis S, Tsiknakis M. Scientific discovery workflows in bioinformatics: a scenario for the coupling of molecular regulatory pathways and gene-expression profiles. Stud Health Technol Inform. 2009; 160:1304-8.
  • [53]Koumakis L, Moustakis V, Tsiknakis M, Kafetzopoulos D, Potamias G. Supporting genotype-to-phenotype association studies with grid-enabled knowledge discovery workflows. In: IEEE, eds. Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE; 2009. pp. 6958–6962.
  • [54]Zacharioudakis G, Koumakis L, Sfakianakis S, Tsiknakis M. A semantic infrastructure for the integration of bioinformatics services. In: IEEE, eds. Intelligent Systems Design and Applications (ISDA’09); 2009. p. 367–372.
  • [55]Cambria E, Hussain A, Havasi C, Eckl C, Munro J. Towards crowd validation of the UK National Health Service. 2010.
  • [56]Kim JD, Cohen KB. Natural language query processing for SPARQL generation: A prototype system for SNOMED CT. Proceedings of BioLINK. 2013.32-38.
  • [57]Cohen KB, Kim JD. Evaluation of SPARQL query generation from natural language questions. Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction. 2013.3.
  • [58]Grigonyte G, Brochhausen M, Martín L, Tsiknakis M, Haller J. Evaluating Ontologies with NLP-Based Terminologies–A Case Study on ACGT and Its Master Ontology. In: Formal Ontology in Information Systems: Proceedings of the Sixth International Conference. 2010.331.
  • [59]Chapman W, Chu D, Dowling J. ConText: An Algorithm for Identifying Contextual Features from Clinical Text. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (pp. 81-88). Association for Computational Linguistics.
  • [60]Solt I, Tikk D, Gal V, Kardkovacs Z. Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier. J Am Med Inform Assoc. 2009; 16(4):580-584.
  • [61]Van Deursen AJ, Van Dijk JA. Using the Internet: Skill related problems in users’ online behavior. Interacting Comput. 2009; 21(5):393-402.
  • [62]Bughin J, Corb L, Manyika J, Nottebohm O, Chui M, de Muller Barbat B, et al. The impact of Internet technologies: Search. High Tech Practice. McKinsey&Company; High Tech Practice. (2011).
  • [63]Adamou A, Andre F, Christ F, Filler A. Apache Stanbol: The RESTful Semantic Engine. [Internet]. 2007 [cited 2013 Sept]. Available from:. http://dev. iks-project.eu/ webcite
  • [64]Jonquet C, Shah NH, Musen MA. The open biomedical annotator. Summit on translational bioinformatics. 2009 56–60.
  • [65]Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen BK et al.. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics. 2014; 15:59. BioMed Central Full Text
  文献评价指标  
  下载次数:12次 浏览次数:24次