期刊论文

【摘要】

Background

Named entity recognition (NER) is an essential step in automatic text processing pipelines. A number of solutions have been presented and evaluated against gold standard corpora (GSC). The benchmarking against GSCs is crucial, but left to the individual researcher. Herewith we present a League Table web site, which benchmarks NER solutions against selected public GSCs, maintains a ranked list and archives the annotated corpus for future comparisons.

Results

The web site enables access to the different GSCs in a standardized format (IeXML). Upon submission of the annotated corpus the user has to describe the specification of the used solution and then uploads the annotated corpus for evaluation. The performance of the system is measured against one or more GSCs and the results are then added to the web site (“League Table”). It displays currently the results from publicly available NER solutions from the Whatizit infrastructure for future comparisons.

Conclusion

The League Table enables the evaluation of NER solutions in a standardized infrastructure and monitors the results long-term. For access please go to http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/ webcite. Contact: rebholz@ifi.uzh.ch.

【授权许可】

2013 Rebholz-Schuhmann et al.; licensee BioMed Central Ltd.

【预览】

附件列表
Files	Size	Format	View
20140708102538921.pdf	143KB	PDF	download

【参考文献】

[1]Rebholz-Schuhmann D, Oellrich A, Hoehndorf R: Text mining biomedical literature: facts for integrative biology. Nat Genet Rev 2012, 13(12):829-839.
[2]Hirschman L, Yeh A, Blaschke C, Valencia A: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6(Suppl 1):S1. BioMed Central Full Text
[3]Smith L, Tanabe L, Ando R, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai HJ, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, et al.: Overview of BioCreative II gene mention recognition. Genome Biol 2008, 9(Suppl 2):S2. [http://genomebiology.com/2008/9/S2/S2 webcite] BioMed Central Full Text
[4]Kim JD, Ohta T, Tateisi Y, Tsujii J: GENIA corpus–semantically annotated corpus for bio-textmining. Bioinformatics 2003, 19(Suppl 10):i180-i182.
[5]Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ: GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics 2005, 6(Suppl 1):S3. BioMed Central Full Text
[6]Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L: Integrated annotation for biomedical information extraction. HLT-NAACL 2004 Workshop: Biolink 2004, “Linking Biological Literature, Ontologies and Databases” 2004, 61-68.
[7]Rebholz-Schuhmann D, Kirsch H, Nenadic G: IeXML: towards a framework for interoperability of text processing modules to improve annotation of semantic types in biomedical text. In HLT-NAACL 2004 Workshop: Biolink 2004, “Linking Biological Literature, Ontologies and Databases”. Fortaleza, Brazil; 2006.
[8]Rebholz-Schuhmann D, Kafkas S, Kim JH, Yepes AJ, Hoehndorf R, Backofen R, Lewin I: Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources. J Biomed Semantics 2013. [http://www.jbiomedsem.com/content/4/1/28/abstract webcite]
[9]Rebholz-Schuhmann D, Jimeno-Yepes A, Li C, Kafkas S, Lewin I, Kang N, Corbett P, Milward D, Buyko E, Beisswanger E, Hornbostel K, Kouznetsov A, Witte R, Laurila J, Baker C, Kuo CJ, Clematide S, Rinaldi F, Farkas R, Maria G, Hara K, Furlong L, Rautschka M, Lara Neves M, Pascual-Montano A, Wei Q, Collier N, Mahbub Chowdhury MF, Lavelli A, Berlanga R, et al.: Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. J Biomed Semantics 2011, 2(Suppl 5):S11. BioMed Central Full Text
[10]Kolárik C, Klinger R, Friedrich C, Hofmann-Apitius M, Fluck J: Chemical names: terminological resources and corpora annotation. Workshop on Building and Evaluating Resources for Biomedical Text Mining (6th Edition of the Language Resources and Evaluation Conference) 2008, 51-58.
[11]Leaman R, Miller C, Gonzalez G: Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark. Proceedings of the 2009 Symposium on Languages in Biology and Medicine 2009.
[12]Kirsch H, Gaudan S, Rebholz-Schuhmann D: Distributed modules for text annotation and IE applied to the biomedical domain. Int J Med Informatics 2006, 75(6):496-500.
[13]Kano Y, Baumgartner WA, McCrohon L, Ananiadou S, Cohen KB, Hunter L, Tsujii J: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 2009, 25:1997-1998.
[14]Prill RJ, Saez-Rodriguez J, Alexopoulos LG, Sorger PK, Stolovitzky G: Crowdsourcing network inference: the DREAM predictive signaling network challenge. Sci Signal 2011, 4(189):mr7.
[15]Moult J, Fidelis K, Kryshtafovych A, Tramontano A: Critical assessment of methods of protein structure prediction (CASP)–round IX. Proteins 2011, 79(Suppl 10):1-5.
[16]Liang P, Abernethy J: MLcomp web site for the evaluation of ML solutions. [http://www.mlcomp.org/ webcite]
[17]Rebholz-Schuhmann D, Jimeno A: League Table Login web site. [http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/ webcite]

Journal of Biomedical Semantics
Monitoring named entity recognition: the League Table

Ian Lewin² Antonio Jimeno Yepes³ Jee-Hyub Kim¹ Senay Kafkas¹ Dietrich Rebholz-Schuhmann¹
[1] European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK;Linguamatics Ltd, 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK;NICTA Victoria Research Lab, Melbourne VIC 3010, Australia
关键词: Named entity; Evaluation; Gold standard corpus; Text mining;
Others : 807016 DOI : 10.1186/2041-1480-4-19

received in 2012-11-15, accepted in 2013-07-25, 发布年份 2013
PDF


	文献评价指标
	下载次数：15次	浏览次数：20次

【 摘 要 】