期刊论文

【摘要】

Background

Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledge-base module that establishes whether a relation exists between the recognized concepts. The knowledge base was filled with information from the Unified Medical Language System. The performance of the system was evaluated on the ADE corpus, consisting of 1644 abstracts with manually annotated adverse drug events. Fifty abstracts were used for training, the remaining abstracts were used for testing.

Results

The knowledge-based system obtained an F-score of 50.5%, which was 34.4 percentage points better than the co-occurrence baseline. Increasing the training set to 400 abstracts improved the F-score to 54.3%. When the system was compared with a machine-learning system, jSRE, on a subset of the sentences in the ADE corpus, our knowledge-based system achieved an F-score that is 7 percentage points higher than the F-score of jSRE trained on 50 abstracts, and still 2 percentage points higher than jSRE trained on 90% of the corpus.

Conclusion

A knowledge-based approach can be successfully used to extract adverse drug events from biomedical text without need for a large training set. Whether use of a knowledge base is equally advantageous for other biomedical relation-extraction tasks remains to be investigated.

【授权许可】

2014 Kang et al.; licensee BioMed Central Ltd.

【预览】

附件列表
Files	Size	Format	View
20150117022626244.pdf	215KB	PDF	download

【参考文献】

[1]Jensen LJ, Saric J, Bork P: Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 2006, 7:119-129.
[2]Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB: Frontiers of biomedical text mining: current progress. Brief Bioinform 2007, 8:358-375.
[3]Simpson MS, Demner-Fushman D: Biomedical text mining: a survey of recent progress. In Mining Text Data. Edited by Aggarwal CC, Zhai C. New York: Springer; 2012:465-517.
[4]Revere D, Fuller S: Characterizing biomedical concept relationships. Med Inform (Lond) 2005, 8:183-210.
[5]Dai HJ, Chang YC, Tzong-Han Tsai R, Hsu WL: New challenges for biological text-mining in the next decade. J Comput Sci Tech 2010, 25:169-179.
[6]Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform 2005, 6:57-71.
[7]Krallinger M, Erhardt RAA, Valencia A: Text-mining approaches in molecular biology and biomedicine. Drug Discov Today 2005, 10:439-445.
[8]Kandula S, Zeng-Treitler Q: Exploring relations among semantic groups: a comparison of concept co-occurrence in biomedical sources. Stud Health Technol Inform 2010, 160:995-999.
[9]Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 2008, 9:S2.
[10]Pyysalo S, Airola A, Heimonen J, Björne J, Ginter F, Salakoski T: Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics 2008, 9:S6.
[11]Jang H, Lim J, Lim J-H, Park S-J, Lee K-C, Park S-H: Finding the evidence for protein-protein interactions from PubMed abstracts. Bioinformatics 2006, 22:e220-e226.
[12]Rinaldi F, Schneider G, Kaljurand K, Hess M, Andronis C, Konstandi O, Persidis A: Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artif Intell Med 2007, 39:127-136.
[13]Fundel K, Küffner R, Zimmer R: RelEx–relation extraction using dependency parse trees. Bioinformatics 2007, 23:365-371.
[14]Saric J, Jensen LJ, Ouzounova R, Rojas I, Bork P: Extraction of regulatory gene/protein networks from Medline. Bioinformatics 2006, 22:645-650.
[15]Kang N, Van Mulligen EM, Kors JA: Comparing and combining chunkers of biomedical text. J Biomed Inform 2011, 44:354-360.
[16]Huang M, Zhu X, Li M: A hybrid method for relation extraction from biomedical literature. Int J Med Inform 2006, 75:443-455.
[17]Buchholz S, Marsi E: CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning; New York, USA. Madison: Omnipress; 2006:149-164.
[18]Katrenko S, Adriaans P: Learning relations from biomedical corpora using dependency tree levels. In KDECB’06 Proceedings of the 1st International Conference on Knowledge Discovery and Emergent Complexity in Bioinformatics; Ghent, Belgium. Heidelberg: Springer; 2006:61-80.
[19]Kim J-H, Mitchell A, Attwood TK, Hilario M: Learning to extract relations for protein annotation. Bioinformatics 2007, 23:256-263.
[20]Ozg A, Radev DR: Semi-supervised classification for extracting protein interaction sentences using dependency parsing. Comput Linguist 2007, 1:228-237.
[21]Huang Y, Lowe HJ, Klein D, Cucina RJ: Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc 2005, 12:275-285.
[22]Demner-Fushman D, Chapman W, McDonald C: What can natural language processing do for clinical decision support? J Biomed Inform 2009, 42:760-772.
[23]Hahn U, Buyko E, Landefeld R, Mühlhausen M, Poprat M, Tomanek K, Wermter J: An overview of JCoRe, the JULIE lab UIMA component repository. In Proceedings of the Language Resources and Evaluation Conference (LREC). Marrakech, Morocco: European Language Resources Association; 2008:1-7.
[24]Thorn CF, Klein TE, Altman RB: Pharmacogenomics and bioinformatics: PharmGKB. Pharmacogenomics 2010, 11:501-505.
[25]Buyko E, Beisswanger E, Hahn U: The extraction of pharmacogenetic and pharmacogenomic relations–a case study using PharmGKB. In Pac Symp Biocomput; Hawaii, USA. Singapore: World Scientific; 2012:376-387.
[26]Rindflesch TC, Fiszman M: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003, 36:462-477.
[27]Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium; Washington, USA. Philadelphia: Hanley & Belfus; 2001:17-21.
[28]Rindflesch T, Fiszman M, Libbus B: Semantic interpretation for the biomedical research literature. Med Inform (Lond) 2005, 8:399-422.
[29]Bodenreider O: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004, 32:267-270.
[30]Rindflesch TC, Aronson AR: Semantic processing for enhanced access to biomedical knowledge. In Real World Semantic Web Applications. Edited by Kashyap V, Shklar L. Hoboken: John Wiley & Sons; 2002:157-172.
[31]Gurulingappa H, Fluck J, Hofmann-Apitius M, Toldo L: Identification of adverse drug event assertive sentences in medical case reports. First International Workshop on Knowledge Discovery and Health Care Management; Athens, Greece 2011, 16-27.
[32]Gurulingappa H, Rajput AM, Toldo L: Extraction of adverse drug effects from medical case reports. J Biomed Semantics 2012, 3:15. BioMed Central Full Text
[33]Gurulingappa H, Rajput AM, Roberts A, Fluck J, Hofmann-Apitius M, Toldo L: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J Biomed Inform 2012, 45:885-892.
[34]Kano Y, Baumgartner WA, McCrohon L, Ananiadou S, Cohen KB, Hunter L, Tsujii J: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 2009, 25:1997-1998.
[35]Bui QC, Sloot PMA: A robust approach to extract biomedical events from literature. Bioinformatics 2012, 28:2654-2661.
[36]Tateisi Y, Yakushiji A, Ohta T, Tsujii J: Syntax annotation for the GENIA corpus. Companion Volume to the Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05); Jeju Island, Korea 2005, 222-227.
[37]Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol 2008, 9:S4.
[38]Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A: An overview of BioCreative II. 5. Comput Biol Bioinform 2010, 7:385-399.
[39]Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M: The protein-protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics 2011, 12:S3.
[40]Kim J-D, Ohta T, Pyysalo S, Kano Y, Tsujii J: Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on BioNLP Shared Task; Boulder, USA. Madison: Omnipress; 2009:1-9.
[41]Kim JD, Pyysalo S, Ohta T, Bossy R, Nguyen N, Tsujii J: Overview of BioNLP shared task 2011. In Proceedings of the BioNLP Shared Task 2011 Workshop; Portland, USA. Madison: Omnipress; 2011:1-6.
[42]Rinaldi F, Clematide S, Garten Y, Whirl-Carrillo M, Gong L, Hebert JM, Sangkuhl K, Thorn CF, Klein TE, Altman RB: Using ODIN for a PharmGKB revalidation experiment. Database J Biol Database Curr 2012, 2012:bas021.
[43]Ferrucci D, Lally A: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 2004, 10:327-348.
[44]Schuemie MJ, Jelier R, Kors JA: Peregrine: lightweight gene name normalization by dictionary lookup. Proceedings of the BioCreAtIvE II Workshop; Madrid, Spain 2007, 131-133.
[45]Bodenreider O, McCray AT: Exploring semantic groups through visual approaches. J Biomed Inform 2003, 36:414-432.
[46]Hettne KM, Van Mulligen EM, Schuemie MJ, Schijvenaars BJ, Kors JA: Rewriting and suppressing UMLS terms for improved biomedical term identification. J Biomed Semantics 2010, 1:1-5. BioMed Central Full Text
[47]Kang N, Singh B, Afzal Z, van Mulligen EM, Kors JA: Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc 2012. doi: 10.1136/amiajnl–2012–001173
[48]Schwartz Hearst MA: AS: a simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of the 8th Pacific Symposium on Biocomputing; Hawaii, USA. Singapore: World Scientific; 2003:451-462.
[49]Hanisch D, Fundel K, Mevissen H-T, Zimmer R, Fluck J: ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics 2005, 6:S14.
[50]Kang N, van Mulligen EM, Kors JA: Training text chunkers on a silver standard corpus: can silver replace gold? BMC Bioinformatics 2012, 30:13.
[51]Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 2008, 9:207. BioMed Central Full Text
[52]Islamaj Doğan R, Névéol A, Lu Z: A context-blocks model for identifying clinical relationships in patient records. BMC Bioinformatics 2011, 12(Suppl 3):S3. BioMed Central Full Text
[53]Melton GB, Hripcsak G: Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 2005, 12:448-457.
[54]Chun HW, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J: Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In Pac Symp Biocomput; Hawaii, USA. Singapore: World Scientific; 2006:4-15.
[55]Uzuner O, South BR, Shen S, Duvall SL: i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2010, 2011(18):552-556.
[56]Elkin PL, Carter JS, Nabar M, Tuttle M, Lincoln M, Brown SH: Drug knowledge expressed as computable semantic triples. Stud Health Technol Inform 2011, 166:38-47.
[57]Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S: DBpedia–a crystallization point for the web of data. Web Seman Scie Serv Age WWW 2009, 7:154-165.
[58]Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006, 34:668-672.

BMC Bioinformatics
Knowledge-based extraction of adverse drug events from biomedical text

Ning Kang¹ Bharat Singh¹ Chinh Bui¹ Zubair Afzal¹ Erik M van Mulligen¹ Jan A Kors¹
[1] Department of Medical Informatics, Erasmus University Medical Center, P.O. Box 2040, 3000, CA, Rotterdam, The Netherlands
关键词: Adverse drug effect; Knowledge base; Relation extraction;
Others : 1087604 DOI : 10.1186/1471-2105-15-64

received in 2013-05-31, accepted in 2014-02-21, 发布年份 2014
PDF


	文献评价指标
	下载次数：10次	浏览次数：13次

【 摘 要 】