期刊论文详细信息
BMC Bioinformatics
Automatic extraction of biomolecular interactions: an empirical approach
Lifeng Zhang3  Daniel Berleant1  Jing Ding2  Eve Syrkin Wurtele4 
[1] Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR, USA
[2] Ohio State University Medical Center, Columbus, OH, USA
[3] Siemens Corporate Research, Princeton, NJ, USA
[4] Department of Genetics, Cell & Development Biology, Iowa State University, Ames, IA, USA
关键词: Networks;    Text mining;    Information extraction;    Biomolecular interactions;   
Others  :  1087802
DOI  :  10.1186/1471-2105-14-234
 received in 2012-10-23, accepted in 2013-07-12,  发布年份 2013
PDF
【 摘 要 】

Background

We describe a method for extracting data about how biomolecule pairs interact from texts. This method relies on empirically determined characteristics of sentences. The characteristics are efficient to compute, making this approach to extraction of biomolecular interactions scalable. The results of such interaction mining can support interaction network annotation, question answering, database construction, and other applications.

Results

We constructed a software system to search MEDLINE for sentences likely to describe interactions between given biomolecules. The system extracts a list of the interaction-indicating terms appearing in those sentences, then ranks those terms based on their likelihood of correctly characterizing how the biomolecules interact. The ranking process uses a tf-idf (term frequency–inverse document frequency) based technique using empirically derived knowledge about sentences, and was applied to the MEDLINE literature collection. Software was developed as part of the MetNet toolkit (http://www.metnetdb.org webcite).

Conclusions

Specific, efficiently computable characteristics of sentences about biomolecular interactions were analyzed to better understand how to use these characteristics to extract how biomolecules interact.

The text empirics method that was investigated, though arising from a classical tradition, has yet to be fully explored for the task of extracting biomolecular interactions from the literature. The conclusions we reach about the sentence characteristics investigated in this work, as well as the technique itself, could be used by other systems to provide evidence about putative interactions, thus supporting efforts to maximize the ability of hybrid systems to support such tasks as annotating and constructing interaction networks.

【 授权许可】

   
2013 Zhang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117044252712.pdf 663KB PDF download
Figure 6. 29KB Image download
Figure 5. 29KB Image download
Figure 4. 68KB Image download
Figure 3. 69KB Image download
Figure 2. 42KB Image download
Figure 1. 31KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Tari L, Anwar S, Liang S, Cai J, Baral C: Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics 2010, 26:i547-i553. ECCB 2010
  • [2]Segura-Bedmar I, Martínez P, de Pablo-Sánchez C: A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents. BMC Bioinformatics 2011, 12(Suppl 2):S1. BioMed Central Full Text
  • [3]Bachman P, Liu Y: Structure discovery in PPI networks using pattern-based network decomposition. Bioinformatics 2009, 25(14):1814-1821.
  • [4]Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32:D449-D451.
  • [5]Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahrén D, Tsoka1 S, Darzentas N, Kunin V, López-Bigas N: Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 2005, 33(19):6083-6089.
  • [6]Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stümpflen V, Mewes H-W, Ruepp A, Frishman D: The MIPS mammalian protein–protein interaction database. Bioinformatics 2005, 21:832-834.
  • [7]Wurtele ES, Li L, Berleant D, Cook D, Dickerson JA, Ding J, Hofmann H, Lawrence M, Lee EK, Li J, Mentzen W, Miller L, Nikolau BJ, Ransom N, Wang Y: MetNet: Systems biology software for Arabidopsis. In Concepts in Plant Metabolomics. Edited by Nikolau BJ, Wurtele ES. Springer; 2007.
  • [8]Baumgartner W, Cohen B, Fox L, Acquaah-Mensah G, Hunter L: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 2007, 23(13):i41-i48.
  • [9]Albert S, Gaudan S, Knigge H, Raetsch A, Delgado A, Huhse B, Kirsch H, Albers M, Rebholz-Schuhmann D, Koegl M: Computer-assisted generation of a protein-interaction database for nuclear receptors. Mol Endocrinol 2003, 17(8):1555-1567.
  • [10]Grimes GR, Wen TQ, Mewissen M, Baxter RM, Moodie S, Beattie JS, Ghazal P: PDQ Wizard: automated prioritization and characterization of gene and protein lists using biomedical literature. Bioinformatics 2006, 22:2055-2057.
  • [11]Ono T, Hishigaki H, Tanigami A, Takagi T: Automated extraction of information on protein–protein interactions from the biological literature. Bioinformatics 2001, 17:155-161.
  • [12]Domedel-Puig N, Wernisch L: Applying GIFT, a Gene Interactions Finder in Text, to fly literature. Bioinformatics 2005, 21:3582-3583.
  • [13]Natarajan J, Berrar D, Dubitzky W, Hack C, Zhang Y, DeSesa C, Van Brocklyn JR, Bremer EG: Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line. BMC Bioinformatics 2006, 7:373. BioMed Central Full Text
  • [14]Garten G, Altman RB: Pharmspresso: A text mining tool for extraction of pharmacogenomic concepts and relationships from full text. BMC Bioinformatics 2009, 10(Suppl 2):S6. BioMed Central Full Text
  • [15]Krallinger M, Leitner F, Vazquez M, Salgado D, Marcelle C, Tyers M, Valencia A, Chatr-aryamontri A: How to link ontologies and protein-protein interactions to literature: Text-mining approaches and the BioCreative experience. Database 2012, bas017. http://database.oxfordjournals.org/content/2012/bas017 webcite
  • [16]Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M: Discovering patterns to extract protein–protein interactions from full texts. Bioinformatics 2004, 20:3604-3612.
  • [17]Wren JD, Garner HR: Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 2004, 20:191-198.
  • [18]Yakushiji A, Tateisi Y, Miyao Y, Tsujii Y: Event extraction from biomedical papers using a full parser in biocomputing. Proceedings of the Pacific Symposium 2001, 6:408-419.
  • [19]Santos C, Eggle D, States DJ: Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction. Bioinformatics 2005, 21:1653-1658.
  • [20]Fundel K, Küffner R, Zimmer R: RelEx—Relation extraction using dependency parse trees. Bioinformatics 2007, 23(3):365-371.
  • [21]Airola A, Pyysalo S, Bjorne J, Pahikkala T, Ginter F, Salakoski T: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 2008, 9(Suppl 11):S2. BioMed Central Full Text
  • [22]Liu B, Qian L, Wang H, Zhou G: Dependency-driven feature-based learning for extracting protein-protein interactions from biomedical text. Proc. 23rd Int. Conf. on Computational Linguistics (Coling 2010) 2010, 757-765.
  • [23]Miyao Y, Sagae K, Saetre R, Mstsuzaki T, Tsujii J: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 2009, 25(3):394-400.
  • [24]Qian L, Zhou G: Dependency-directed tree kernel-based protein-protein interaction extraction from biomedical literature. Proc. 5th Int. Joint Conf. on Natural Language Processing 2011, 10-19.
  • [25]Saetre R, Sagae K, Tsujii J: Syntactic features for protein-protein interaction extraction. Short Paper Proc. of the 2nd Int. Symp. on Languages in Biology and Medicine (LBM) 2007, 6:1-6. 14. http://ceur-ws.org/Vol-319 webcite
  • [26]Tikk D, Thomas P, Palaga P, Hakenberg J, Leser U: A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. PLoS Comput Biol 2010, 6(7):e1000837.
  • [27]Eom JH, Zhang BT: Extraction of gene/protein interaction from text documents with relation kernel. 9th Int Conf on Knowledge-Based and Intelligent Inf. & Eng. Systems (KES) 2005, 936-942.
  • [28]Bell L, Zhang J, Niu X: Mixture of logistic models and an ensemble approach for protein-protein interaction extraction. Proc 2nd ACM Conf on Bioinformatics Computational Biology and Biomedicine (BCB’11) 2011, 371-375. http://dl.acm.org/citation.cfm?id=2147853 webcite
  • [29]Ding J, Berleant D, Nettleton D, Wurtele E: Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp Biocomput 2002, 326-337.
  • [30]Polajnar T, Damoulas T, Girolami M: Protein interaction sentence detection using multiple semantic kernels. J Biomed Semantics 2011, 2:1. BioMed Central Full Text
  • [31]Tsai RT-H, Lai P-T: Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles. BMC Bioimformatics 2011, 12:60. BioMed Central Full Text
  • [32]Zhang L, Berleant D, Ding J, Cao T, Wurtele ES: PathBinder—Text empirics and automatic extraction of biomolecular interactions. BMC Bioinformatics 2009, 10(Suppl 11):S18. BioMed Central Full Text
  • [33]Rindflesch TC, Libbus B, Hristovski D, Aronson AR, Kilicoglu H: Semantic relations asserting the etiology of genetic diseases. AMIA 2003 Symposium Proceedings 2003, 554-558.
  • [34]Yen YT, Chen B, Chiu HW, Lee YC, Li YC, Hsu CY: Developing an NLP and IR-based algorithm for analyzing gene-disease relationships. Methods Inf Med 2006, 45:321-329.
  • [35]Kim J-D, Ohta T, Pyysalo S, Kano Y, Tsujii J: Overview of the BioNLP’09 Shared Task on event extraction. Proc. Natural Language Processing in Biomedicine (BioNLP) NAACOL 2009 workshop 2009, 1-9. http://aclweb.org/anthology-new/W/W09/W09-1401.pdf webcite
  • [36]Kim J-D, Nguyen N, Wang Y, Tsujii J, Takagi T, Yonezawa A: The GENIA Event and Protein Coreference tasks of the BioNLP Shared Task 2011. BMC Bioinformatics 2012, 13(Suppl 11):S1. http://www.biomedcentral.com/1471-2105/13/S11/S1 webcite BioMed Central Full Text
  • [37]Nguyen QL, Tick D, Leser U: Simple tricks for improving pattern-based information extraction from the biomedical literature. J Biomed Semantics 2010., 1(9) http://www.jbiomedsem.com/content/1/1/9 webcite
  • [38]Niu Y, Otasek D, Jurisica I: Evalulation of linguistic features useful in extraction of nteractions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D. Bioinformatics 2010, 26(1):111-119.
  • [39]Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 2001, 17(Suppl 1):S74-S82.
  • [40]Rosario B, Hearst M: Multi-way relation classification: application to protein-protein interactions. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing 2005, 732-739.
  • [41]Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 2008, 9:207-221. BioMed Central Full Text
  • [42]Zhou D, He Y: Extracting protein-protein interactions from MEDLINE using the Hidden Vector State model. Int J Bioinform Res Appl 2008, 4:64-80.
  • [43]Chowdhary R, Zhang J, Liu JS: Bayesian inference of protein-protein interactions from biological literature. Bioinformatics 2009, 25(12):1536-1542.
  • [44]Li Y, Hu X, Lin H, Yang Z: Learning an enriched representation from unlabeled data for protein-protein interaction extraction. BMC Bioinformatics 2010, 11(Suppl 2):S7. BioMed Central Full Text
  • [45]Polajnar T, Rogers S, Girolami M: Classification of protein interaction sentences via Gaussian processes. Proceedings of Pattern Recognition in Bioinformatics 2009, 282-292. (PRIB 2009), Lecture Notes in Computer Science 5780, Springer-Verlag
  • [46]Zipf GK: The meaning-frequency relationship of words. J Gen Psychol 1945, 33:251-256.
  • [47]Zipf GK: The repetition of words, time-perspective, and semantic balance. J Gen Psychol 1945, 32:127-148.
  • [48]Bowater R, Webb MR, Ferenczi MA: Measurement of the reversibility of ATP binding to myosin in calcium- activated skinned fibers from rabbit skeletal muscle. Oxygen exchange between water and ATP released to the solution. J Biol Chem 1989, 264:7193-7201.
  • [49]Gafurov B, Chen YD, Chalovic JM: Ca2+ and Ionic Strength Dependencies of S1-ADP Binding to Actin-Tropomyosin-Troponin: Regulatory Implications. Biophys J 2004, 87:1825-1835.
  • [50]Ramachandran S, Thomas DD: Rotational dynamics of the regulatory light chain in scallop muscle detected by time-resolved phosphorescence anisotropy. Biochemistry 1999, 38:9097-9104.
  • [51]Bagshaw CR, Trentham DR, Wolcott RG, Boyer PD: Oxygen exchange in the gamma-phosphoryl group of protein-bound ATP during Mg2 + −dependent adenosine triphosphatase activity of myosin. Proc Natl Acad Sci (USA) 1975, 72(7):2592-2596. July
  • [52]Dickerson JA, Berleant D, Du P, Ding J, Foster CM, Li L, Wurtele ES: Creating, modeling, and visualizing metabolic networks, chapter 17. In Medical Informatics: Knowledge Management and Data Mining in Biomedicine. Edited by Chen H, Fuller SS, Friedman C, Hersh W. Springer; 2005:491-518.
  • [53]Berleant D: Combining evidence: the naïve Bayes model vs. semi-naïve evidence combination. 2004. [Software Artifact Research and Development Laboratory Technical Report SARD04-11] http://ualr.edu/jdberleant/papers/seminaivemodel.pdf webcite
  • [54]Manning CD, Raghavan R, Schütze H: Introduction to Information Retrieval. Cambridge University Press; 2008.
  • [55]Davis E: Representations of Commonsense Knowledge. Morgan Kaufmann; 1990.
  • [56]Zhang L: Text Mining for Systems Biology and MetNet, Ph. D. dissertation. Iowa State University; 2010. http://ualr.edu/jdberleant/papers/ZLFdissertationUpload.pdf webcite
  • [57]Chaudière J: Possible role of glutathione peroxidase in the regulation of collagenase activity. Ann Biol Clin 1986, 44:181-187.
  • [58]Bui Q-C, Katrenko S, Sloot PMA: A hybrid approach to extract protein-protein interactions. Brief Bioinform 2010, 27(2):259-265.
  文献评价指标  
  下载次数:101次 浏览次数:16次