期刊论文

【摘要】

BackgroundThe vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.ResultsThe Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.ConclusionThe development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.

【授权许可】

CC BY
© Chatr-aryamontri et al; licensee BioMed Central Ltd. 2011

【预览】

附件列表
Files	Size	Format	View
RO202311100242683ZK.pdf	1014KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]

BMC Bioinformatics
Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
Research
Livia Perfetto¹ Luisa Castagnoli¹ Leonardo Briganti¹ Marta Iannuccelli¹ Luana Licata¹ Gianni Cesareni² Andrew Winter³ Andrew Chatr-aryamontri³ Mike Tyers⁴
[1] Department of Biology, University of Rome Tor Vergata, 00133, Rome, Italy;Department of Biology, University of Rome Tor Vergata, 00133, Rome, Italy;IRCSS, Fondazione Santa Lucia, 00143, Rome, Italy;School of Biological Sciences, University of Edinburgh, EH9 3JR, Edinburgh, UK;School of Biological Sciences, University of Edinburgh, EH9 3JR, Edinburgh, UK;Center for Systems Biology, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada;
关键词: Text Mining; Biomedical Literature; Evidence Code; Expert Curator; EMBO Journal;
DOI : 10.1186/1471-2105-12-S8-S8
来源: Springer
PDF


	文献评价指标
	下载次数：6次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】