期刊论文详细信息
BMC Bioinformatics
Discovery of novel biomarkers and phenotypes by semantic technologies
Svetlana Bureeva2  Mark E Sharp3  David Peregrim3  Christoph Wälti1  Carlo A Trugenberger1 
[1]InfoCodex AG, Semantic Technologies, Bahnhofstrasse 50, Buchs (SG), CH-9470, Switzerland
[2]Thomson Reuters, 5901 Priestly Drive, STE 200, Carlsbad, CA, 92008, USA
[3]Merck Research Laboratories, 126 East Lincoln Avenue, Rahway, NJ 07065, USA
关键词: Discovery of novel relationships;    Biomedical ontologies;    Text mining;    Semantic technologies;    In silico drug research;   
Others  :  1087983
DOI  :  10.1186/1471-2105-14-51
 received in 2012-06-22, accepted in 2013-02-01,  发布年份 2013
PDF
【 摘 要 】

Background

Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents.

Results

This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated.

Conclusions

The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions.

【 授权许可】

   
2013 Trugenberger et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117062956757.pdf 1832KB PDF download
Figure 8. 32KB Image download
Figure 7. 32KB Image download
Figure 6. 31KB Image download
20150113143856993.pdf 249KB PDF download
Figure 4. 67KB Image download
Figure 3. 46KB Image download
Figure 2. 46KB Image download
Figure 1. 114KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 6.

Figure 7.

Figure 8.

【 参考文献 】
  • [1]The changing role of chemistry in drug discovery: Thomson Reuters: International Year of Chemistry (IYC 2011) report. http://www.thomsonreuters.com/content/science/pdf/ls/iyc2011.pdf webcite
  • [2]Ranjan J: Applications of data mining techniques in the pharmaceutical industry. Technol: J Theor Appl Inf; 2005:61-67.
  • [3]Mattos N: IBM study. 2005. http://news.cnet.com/IBM-dives-deeper-into-corporate-search/2100-7344_3-5820938.html webcite
  • [4]Schneider G: Virtual screening: an endless staircase? Nat Rev Drug Discov 2010, 9:273-276.
  • [5]Hahn U, Cohen KB, Garten Y, Shah NH: Mining the pharmacogenomics literature: a survey of the state of the art. Brief Bioinform 2012, 13(4):460-494.
  • [6]Garten Y, Coulet A, Altman RB: Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics 2010, 11:1467-1489.
  • [7]Biomarkers market discovery technologies (proteomics, genomics, imaging, bioinformatics), applications (drug discovery, personalized medicine, molecular diagnostics) & indications (cancer, cardiovascular & neural) - global trends & forecasts (2011–2020). http://www.marketsandmarkets.com/Market-Reports/biomarkers-advanced-technologies-and-global-market-43.html webcite
  • [8]Ioannidis JPA, Panagiotou OA: Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. J Am Med Assoc 2011, 305(21):2200-2210.
  • [9]PubMedhttp://www.ncbi.nlm.nih.gov/pubmed/ webcite
  • [10]ClinicalTrials.govhttp://www.clinicaltrials.gov/ webcite
  • [11]UMLShttp://www.nlm.nih.gov/research/umls/ webcite
  • [12]Genehttp://www.ncbi.nlm.nih.gov/gene webcite
  • [13]Gene Ontologyhttp://www.geneontology.org/ webcite
  • [14]OMIMhttp://www.ncbi.nlm.nih.gov/omim webcite
  • [15]Thomson Reutershttp://www.thomsonreuters.com webcite
  • [16]Wälti P, Trugenberger CA, Cuypers F, Wälti C: Sprach- und text-vorrichtung und entsprechendes verfahren, Patents EP1779271-B1 and US2007-0282598-A1/US2008-0215313-A1. 2008.
  • [17]Cover TM, Thomas JA: Elements of Information Theory. 2nd edition. Hoboken: John Wiley & Sons; 2006.
  • [18]Kohonen T: Self-Organizing Maps. 3rd edition. Berlin: Springer Verlag; 2001.
  • [19]Fellbaum C: WordNet: An Electronic Lexical Database. Cambridge MA: MIT Press; 1998.
  • [20]Barry JM, Pollard JP, Wachspress EW: A method of parallel iteration. J Comput Appl Math 1989, 28:119-127.
  • [21]Kullback S, Leibler RA: On information and sufficiency. Ann. Math. Statist 1951, 22(1):79-87.
  • [22]Shaw AP: (Program Co-Chair < tony@semanticweb.com>): Semantic Tech & Business Conference: 26-27 September 2011. Trugenberger CA; 2011. http://semtechbizuk2011.semanticweb.com/index.cfm webcite
  • [23]Späth H: Cluster analysis algorithms for data reduction and classification of objects. Chichester: Ellis Horwood; 1980. Translated by Bull U
  • [24]Liu K, Hogan WR, Crowley RS: Natural language processing methods and systems for biomedical ontology learning. J Biomed Inform 2011, 44:163-179.
  • [25]Linguamatics I2Ehttp://www.linguamatics.com/welcome/software/I2E.html webcite
  • [26]GO Online SQL Environmenthttp://www.berkeleybop.org/goose/ webcite
  • [27]Type 1 and Type 2 Diabetes. What do they have in Common?. http://diabetes.diabetesjournals.org/content/54/suppl_2/S40.full.pdf webcite
  • [28]Elevated Intact Proinsulin Levels Are Indicative of Beta-Cell Dysfunction, Insulin Resistance and Cardiovascular Risk: Impact of the Antidiabetic Agent Pioglitazone. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3192645/pdf/dst-05-0784.pdf webcite
  • [29]Pakhomov S, Mcinnes BT, Lamba J, Liu Y, Melton GB, Ghodke Y, Bhise N, Lamba V, Birnbaum AK: Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies. J Biomed Inform 2012, 45(5):862-869.
  • [30]Hakenberg J, Voronov D, Nguyen VH, Liang S, Anwar S, Lumpkin B, Learman R, Tari L, Baral C: A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J Biomed Inform 2012, 45(5):842-850.
  • [31]Li J, Lu Z: Systematic identification of pharmacogenomics information from clinical trials. J Biomed Inform 2012, 45(5):870-878.
  • [32]Xu R, Wang Q: A knowledge-driven conditional approach to extract pharmacogenomics specific drug–gene relationships from free text. J Biomed Inform 2012, 45(5):827-834.
  文献评价指标  
  下载次数:86次 浏览次数:37次