BMC Bioinformatics | |
Semantically linking molecular entities in literature through entity relationships | |
Proceedings | |
Jari Björne1  Tapio Salakoski1  Bernard De Baets2  Yves Van de Peer3  Sofie Van Landeghem3  Thomas Abeel4  | |
[1] Department of Information Technology, University of Turku/Turku Centre for Computer Science (TUCS), Turku, Finland;Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Gent, Belgium;Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), B-9052, Gent, Belgium;Department of Biotechnology and Bioinformatics, Ghent University, B-9052, Gent, Belgium;Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), B-9052, Gent, Belgium;Department of Biotechnology and Bioinformatics, Ghent University, B-9052, Gent, Belgium;Broad Institute of MIT and Harvard, Cambridge, MA, USA; | |
关键词: Noun Phrase; Latent Semantic Analysis; Semantic Space; Molecular Entity; Share Task; | |
DOI : 10.1186/1471-2105-13-S11-S6 | |
来源: Springer | |
![]() |
【 摘 要 】
BackgroundText mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts.ResultsWe describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score > 90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts.ConclusionsThe results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.
【 授权许可】
CC BY
© Van Landeghem et al; licensee BioMed Central Ltd. 2012
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311109313576ZK.pdf | 777KB | ![]() |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]