期刊论文详细信息
Journal of Clinical Bioinformatics
Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
Vanathi Gopalakrishnan1  Shyam Visweswaran1  Rick Jordan2 
[1] Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA;Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
关键词: Biofluid;    Biomarker;    Breast cancer;    Lung cancer;    Text mining;    Literature mining;   
Others  :  1133395
DOI  :  10.1186/2043-9113-4-13
 received in 2014-06-26, accepted in 2014-10-02,  发布年份 2014
PDF
【 摘 要 】

Background

Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids.

Methodology

A positive set of abstracts was defined by the terms ‘breast cancer’ and ‘lung cancer’ in conjunction with 14 separate ‘biofluids’ (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms ‘(biofluid) NOT breast cancer’ or ‘(biofluid) NOT lung cancer.’ More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method’s performance.

Results

Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI’s On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI’s Genes & Disease, NCI’s Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer.

Conclusions

We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.

【 授权许可】

   
2014 Jordan et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150304144152125.pdf 691KB PDF download
Figure 2. 63KB Image download
20150530090642952.pdf 7187KB PDF download
【 图 表 】

Figure 2.

【 参考文献 】
  • [1]Hirschman L, Park JC, Tsujii J, Wong L, Wu CH: Accomplishments and challenges in literature data mining for biology. Bioinformatics 2002, 18:1553-1561.
  • [2]Adamic LA, Wilkinson D, Huberman BA, Adar E: A literature based method for identifying gene-disease connections. Proc IEEE Comput Soc Bioinform Conf 2002, 1:109-117.
  • [3]Wren JD, Bekeredjian R, Stewart JA, Shohet RV, Garner HR: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 2004, 20:389-398.
  • [4]Xuan W, Wang P, Watson SJ, Meng F: Medline search engine for finding genetic markers with biological significance. Bioinformatics 2007, 23:2477-2484.
  • [5]Hristovski D, Peterlin B, Mitchell JA, Humphrey SM: Using literature-based discovery to identify disease candidate genes. Int J Med Inform 2005, 74:289-298.
  • [6]Novichkova S, Egorov S, Daraseila N: MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 2003, 19:1699-1706.
  • [7]Srinivasan P: Text mining: generating hypotheses from MEDLINE. J Am Soc Inform Sci Technol 2004, 55:396-413.
  • [8]Leonard JE, Colombe JB, Levy JL: Finding relevant references to genes and proteins in Medline using a Bayesian approach. Bioinformatics 2002, 18:1515-1522.
  • [9]Jensen LJ, Saric J, Bork P: Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 2006, 7:119-129.
  • [10]Krallinger M, Valencia A, Hirschman L: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 2008, 9(Suppl.2):S8.
  • [11]Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform 2005, 6:57-71.
  • [12]Swanson DR: Medical literature as a potential source of new knowledge. Bull Med Libr Assoc 1990, 78:29-37.
  • [13]Zhu S, Okuno Y, Tsujimoto G, Mamitsuka H: Application of a new probabilistic model for mining implicit associated cancer genes from OMIM and Medline. Cancer Inform 2006, 2:361-371.
  • [14]Frijters R, Van Vugt M, Smeets R, Van Schaik R, De Vlieg J, Alkema W: Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput Biol 2010, 6:e1000943.
  • [15]Li H, Liu C: Biomarker identification using text mining. Comput Math Methods Med 2012, 2012:135780.
  • [16]Al-Mubaid H, Singh RK: A new text mining approach for finding protein-to-disease associations. Am J Biochem Biotechnol 2005, 1:145-152.
  • [17]Andrade MA, Valencia A: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 1998, 14:600-607.
  • [18]Younesi E, Toldo L, Muller B, Friedrich CM, Novac N, Scheer A, Hofmann-Apitius M, Fluck J: Mining biomarker information in biomedical literature. BMC Med Inform Decis Mak 2012, 12:148. BioMed Central Full Text
  • [19]Deyati A, Younesi E, Hofmann-Apitius M, Novac N: Challenges and opportunities for oncology biomarker discovery. Drug Discov Today 2012, 18:614-624.
  • [20]Veenstra T, Conrads T, Hood B, Avellino A, Ellenbogen R, Morrison R: Biomarkers: mining the biofluid proteome. Mol Cell Proteomics 2005, 4:409-418.
  • [21]Zhou M, Conrads T, Veenstra T: Proteomics approaches to biomarker detection. Brief Funct Genom Proteomics 2005, 4:69-75.
  • [22]Lee Y, Wong D: Saliva: An emerging biofluid for early detection of diseases. Am J Dent 2009, 22:241-248.
  • [23]Gao K, Zhou H, Zhang L, Lee J, Zhou Q, Hu S, Wolinsky L, Farrell J, Eibl G, Wong D: Systemic disease-induced salivary biomarker profiles in mouse models of melanoma and non-small cell lung cancer. PLoS One 2009, 4:e5875.
  • [24]Xu X, Veenstra T: Analysis of biofluids for biomarker research. Proteomics Clin Appl 2008, 2:1403-1412.
  • [25]Delaleu N, Immervoll H, Cornelius J, Jonsson R: Biomarker profiles in serum and saliva of experimental Sjogren’s syndrome: associations with specific autoimmune manifestations. Arthritis Res Ther 2008, 10:R22. BioMed Central Full Text
  • [26]Alterovitz G, Xiang M, Liu J, Chang A, Ramoni MF: System-wide peripheral biomarker discovery using information theory. Pac Symp Biocomput 2008, 231-242.
  • [27]Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res 2004, 32(Database issue):D262-D266.
  • [28]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25-29.
  • [29]Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchecko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the national center for biotechnology information. Nucleic Acids Res 2007, 35(Database issue):D5-D12. Epub 2006 Dec 14
  • [30]Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman RB, Klein TE: PharmGKB: the pharmacogenetics knowledge base. Nucleic Acids Res 2002, 30(1):163-165.
  • [31]Settles B: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 2005, 21:3191-3192.
  • [32]Park YK, Kang TW, Baek SJ, Kim KI, Kim SY, Lee D, Kim YS: CaGe: a web-based cancer gene annotation system for cancer genomics. Genom Inform 2012, 10(1):33-39. Epub 2012 Mar 31
  • [33]National Center for Biotechnology Information (US): Genes and Disease [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 1998. Available from: http://www.ncbi.nlm.nih.gov/books/NBK22183/ webcite
  • [34]Wagner PD, Srivastava S: New paradigms in translational science research in cancer biomarkers. Transl Res 2012, 159(4):343-353. Epub 2012 Feb 3
  • [35]Bigbee WL, Gopalakrishnan V, Weissfeld JL, Wilson DO, Dacic S, Lokshin AE, Siegfried JM: A multiplexed serum biomarker immunoassay panel discriminates clinical lung cancer patients from high-risk individuals found to be cancer-free by CT screening. J Thorac Oncol 2012, 7(4):698-708.
  • [36]Cancer Genome Atlas Network: Comprehensive molecular portraits of human breast tumours. Nature 2012. Advanced online publication
  文献评价指标  
  下载次数:36次 浏览次数:41次