期刊论文详细信息
BMC Medical Informatics and Decision Making
Mining biomarker information in biomedical literature
Technical Advance
Bernd Müller1  Juliane Fluck1  Erfan Younesi2  Martin Hofmann-Apitius2  Christoph M Friedrich3  Alexander Scheer4  Natalia Novac5  Luca Toldo5 
[1] Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754, Schloss Birlinghoven, Sankt Augustin, Germany;Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754, Schloss Birlinghoven, Sankt Augustin, Germany;Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany;Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754, Schloss Birlinghoven, Sankt Augustin, Germany;Department of Computer Science, University of Applied Science and Arts, Dortmund, Germany;Informatics & Knowledge Management, Merck Serono, Merck KGaA, Geneva, Switzerland;Knowledge Management, Operational Excellence & Site Coordination, Merck Serono, Merck KGaA, Darmstadt, Germany;
关键词: Text-mining;    Biomarker discovery;    Information retrieval;    Terminology;   
DOI  :  10.1186/1472-6947-12-148
 received in 2012-02-03, accepted in 2012-12-10,  发布年份 2012
来源: Springer
PDF
【 摘 要 】

BackgroundFor selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives.MethodsA biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases.ResultsThe current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html.ConclusionsThe approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.

【 授权许可】

Unknown   
© Younesi et al.; licensee BioMed Central Ltd. 2012. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311090541627ZK.pdf 1859KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  文献评价指标  
  下载次数:3次 浏览次数:1次