期刊论文详细信息
Journal of Biomedical Semantics
A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
Amar K Das1  Martin J O’Connor2  Saeed Hassanpour2 
[1] Geisel School of Medicine at Dartmouth, 46 Centerra Drive, Suite 330, Lebanon, NH 03766, USA;Stanford Center for Biomedical Informatics Research, Stanford, CA 94305, USA
关键词: Autism phenotypes;    Biomedical definitions;    Rules;    Ontologies;    Knowledge acquisition;   
Others  :  824733
DOI  :  10.1186/2041-1480-4-14
 received in 2012-09-29, accepted in 2013-07-04,  发布年份 2013
PDF
【 摘 要 】

Background

A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.

Results

Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test.

Conclusions

Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.

【 授权许可】

   
2013 Hassanpour et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140713044300240.pdf 2856KB PDF download
Figure 4. 59KB Image download
Figure 3. 45KB Image download
Figure 2. 53KB Image download
Figure 1. 18KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Delfs R, Doms A, Kozlenkov E, Schroeder M: GoPubMed: ontology-based literature search applied to Gene Ontology and PubMed. In Proceedings of German Bioinformatics Conference. LNBI; 2004:169-178.
  • [2]Muller HM, Kenny EE, Sternberg PW: Textpresso: An ontology-based information retrieval and extraction system for biological literature. PLoSBiol 2004, 2(11):e309.
  • [3]Tu SW, Tennakoon L, Das AK: Using an integrated ontology and information model for querying and reasoning about phenotypes: the case of autism. In Proceedings of AMIA Annual Symposium: 2008. Washington DC; 2008:727-731.
  • [4]Hus V, Pickles A, Cook EH, Risi S, Lord C: Using the Autism Diagnostic Interview-Revised to increase phenotypic homogeneity in genetic studies of autism. Biol Psychiatry 2007, 61(4):438-448.
  • [5]SWRL Submission. http://www.w3.org/Submission/SWRL webcite
  • [6]McGuinness DL, van Harmelen F: OWL Web Ontology Language Overview. [W3C Recommendation 10 February 2004] http://www.w3.org/TR/2004/REC-owl-features-20040210 webcite
  • [7]Hassanpour S, O’Connor MJ, Das AK: Evaluation of semantic-based information retrieval methods in the autism phenotype domain. In Proceedings of AMIA Annual Symposium: 2001. Washington DC; 2011:569-577.
  • [8]Hassanpour S, Das AK: Ontology-based text mining of concept definitions in biomedical literature. In Proceedings of the Third Canadian Semantic Web Symposium (CSWS): 2011. Vancouver, Canada; 2011:40-45.
  • [9]Liu B, Hsu W, Ma Y: Integrating classification and association rule mining. Proceedings of Knowledge Discovery in Databases 1998, 80-86.
  • [10]Yangarber R, Grishman R, Tapanainen P, Huttunen S: Automatic acquisition of domain knowledge for information extraction. In Proceedings of Eighteenth International Conference on Computational Linguistics: 2000. Saarbrücken, Germany; 2000:940-946.
  • [11]Maedche A, Staab S: Ontology learning for the Semantic Web. IEEE Intell Sys 2001, 16(2):72-79.
  • [12]Alani H, Kim S, Millard DE, Weal MJ, Hall W, Lewis PH, Shadbolt NR: Automatic ontology-based knowledge extraction from web documents. IEEE Intell Sys 2003, 18(1):14-21.
  • [13]Rinaldi F, Schneider G, Kaljurand K, Clematide S, Vachon T, Romacker M: OntoGene in BioCreative II.5. IEEE/ACM Trans ComputBiolBioinf 2010, 7(3):472-480.
  • [14]Xu F, Kurz D, Piskorski J, Schmeier S: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping. Proceedings of Third International Conference of Language Resources and Evaluation: 2002
  • [15]Riloff E, Jones R: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence: 1999 1999, 474-479.
  • [16]Crow L, Shadbolt N: Extracting Focused Knowledge from the Semantic Web. Int J Hum Comput Stud 2001, 54:155-184.
  • [17]Buitelaar P, Olejnik D, Sintek M: A Protégé plug-in for ontology extraction from text based on linguistic analysis. Proceedings of the International Semantic Web Conference: 2003 2003, 31-44.
  • [18]Kang J, Lee JK: Rule identification from web pages by the XRML approach. Decis Support Syst 2005, 41(1):205-227.
  • [19]Ben Aharon R, Szpektor I, Dagan I: Generating Entailment Rules from FrameNet. Proceedings of the Forty Eighths Annual Meeting of the Association for Computational Linguistics: 2010; Uppsala, Sweden 2010, 241-246.
  • [20]Schoenmackers S, Etzioni O, Weld DS, Davis J: Learning first-order horn clauses from web text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: 2010. Stroudsburg, PA; 2010:1088-1098.
  • [21]Augier S, Venturini G, Kodrato K: Learning first order logic rules with a genetic algorithm. Proceedings of the International Conference on Knowledge Discovery and Data Mining: 1995 1995, 21-26.
  • [22]SemEx. http://ontorule-project.eu/news/news/semex-texts-rules webcite, 2011
  • [23]Duboue PA, McKeown KR: Statistical acquisition of content selection rules for natural language generation. In Proceedings of EMNLP 2003, 2003:121-128.
  • [24]Manine AP, Alphonse E, Bessières P: Learning ontological rules to extract multiple relations of genetic interactions from text. Int J Med Informat 2009, 78(12):e31-e38.
  • [25]Park S, Lee JK: Rule identification using ontology while acquiring rules from Web pages. Int J Hum Comput Stud 2007, 65(7):659-673.
  • [26]Hassanpour S, O’Connor MJ, Das AK: A framework for the automatic extraction of rules from online text. In Proceedings of the International Symposium on Rule Interchange and Application: 2011. Barcelona, Spain; 2011:266-280.
  • [27]Hassanpour S, O’Connor MJ, Das AK: Comparison of Ontology-Based Semantic Similarities in the Information Retrieval of Autism Phenotype Publications. In Proceedings of Bio-Ontologies. Long Beach, CA; 2012.
  • [28]Hassanpour S, O’Connor MJ, Das AK: Exploration of SWRL rule bases through visualization, paraphrasing, and categorization of rules. In Proceedings of International RuleML Symposium on Rule Interchange and Applications: 2009. Las Vegas, NV; 2009:46-261.
  文献评价指标  
  下载次数:40次 浏览次数:41次