期刊论文详细信息
BMC Bioinformatics
USI: a fast and accurate approach for conceptual document annotation
Nicolas Fiorini2  Sylvie Ranwez2  Jacky Montmain2  Vincent Ranwez1 
[1] UMR AGAP, Montpellier SupAgro/CIRAD/INRA, 2 place Pierre Viala, Montpellier Cedex 1 34060, France
[2] LGI2P research center from the Ecole des mines d’Alès, Site de Nîmes, Parc scientifique G. Besse, Nîmes cedex 1 30035, France
关键词: Benchmarking;    Complexity;    Algorithms;    Semantic annotations;   
Others  :  1139038
DOI  :  10.1186/s12859-015-0513-4
 received in 2014-10-23, accepted in 2015-02-24,  发布年份 2015
PDF
【 摘 要 】

Background

Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document.

Results

In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity.

Conclusions

By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion — instead of one score per concept.

【 授权许可】

   
2015 Fiorini et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150321011824535.pdf 797KB PDF download
Figure 2. 30KB Image download
Figure 1. 74KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Baeza-Yates R, Ribeiro-Neto B: Modern Information Retrieval. ACM press, New York; 1999.
  • [2]Stokoe C, Oakes MP, Tait J. Word sense disambiguation in information retrieval revisited. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York: ACM. p. 159–166.
  • [3]Haav H, Lubi T: A survey of concept-based information retrieval tools on the web. Proc 5th East-European Conference ADBIS. 2001, 2:29-41.
  • [4]Zhou X, Zhang X, Hu X: MaxMatcher: Biological concept extraction using approximate dictionary lookup. PRICAI 2006: Trends in Artificial Intelligence. Springer, Berlin Heidelberg; 2006.
  • [5]Baziz M, Boughanem M, Pasi G, Prade H: An information retrieval driven by ontology from query to document expansion. Large Scale Semantic Access to Content (Text, Image, Video, and Sound). Le Centre de Hautes Études Internationales d’Informatique Documentaire, Paris; 2007.
  • [6]Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007, 23(13):41-8.
  • [7]Carson C, Thomas M, Belongie S, Hellerstein JM, Malik J. Blobworld: A system for region-based image indexing and retrieval. Visual Information and Information Systems. 1999. p. 509–17.
  • [8]Zhang S, Tian Q, Hua G, Huang Q, Gao W: ObjectPatchNet: Towards scalable and semantic image annotation and retrieval. Comput Vision Image Understanding. 2014, 118:16-29.
  • [9]Jimeno Yepes A, Mork JG, Wilkowski B, Demner Fushman D, Aronson AR. MEDLINE MeSH indexing: lessons learned from machine learning and future directions. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. New York: ACM. p. 737–42.
  • [10]Turnbull D, Barrington L: Semantic annotation and retrieval of music and sound effects. Semantic annotation and retrieval of music and sound effects. Audio, Speech, Language Process 2008, 16(2):467-76.
  • [11]Tseng VS, Su J-H, Huang J-H, Chen C-J: Integrated mining of visual features, speech features, and frequent patterns for semantic video annotation. IEEE Trans Multimedia 2008, 10(2):260-7.
  • [12]Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ: The NLM indexing initiative’s medical text indexer. Medinfo 2004, 11(Pt 1):268-72.
  • [13]Jonquet C, Shah NH, Musen MA: The open biomedical annotator. Summit translational bioinf 2009, 2009:56.
  • [14]Zhou X, Zhang X, Hu X: Using concept-based indexing to improve language modeling approach to genomic IR. Advances in Information Retrieval. Springer, Berlin Heidelberg; 2006.
  • [15]Neves M, Leser U: A survey on annotation tools for the biomedical literature. Briefings bioinf 2014, 15(2):327-40.
  • [16]Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings of the AMIA Symposium. American Medical Informatics Association, Washington DC; 2001.
  • [17]Lin J, Wilbur WJ: PubMed related articles: a probabilistic topic-based model for content similarity. BMC bioinf 2007, 8(1):423. BioMed Central Full Text
  • [18]Huang M, Névéol A, Lu Z: Recommending MeSH terms for annotating biomedical articles. J Am Med Informatics Assoc 2011, 18(5):660-7.
  • [19]Mao Y, Lu Z. NCBI at the 2013 BioASQ challenge task: Learning to rank for automatic MeSH indexing. Technical report. 2013.
  • [20]Mao Y, Wei C-H, Lu Z. NCBI at the 2014 BioASQ challenge task: large-scale biomedical semantic indexing and question answering. CLEF 2014 Working Notes Proceedings. Aachen: CEUR-WS: 2014. p. 1319–27.
  • [21]Delbecque T, Zweigenbaum P: Using Co-Authoring and Cross-Referencing Information for MEDLINE Indexing. AMIA Annu Symp Proc 2010, 2010:147.
  • [22]Vasuki V, Cohen T: Reflective random indexing for semi-automatic indexing of the biomedical literature. J biomed informatics 2010, 43(5):694-700.
  • [23]Jimeno-Yepes A, Mork JG, Demner-Fushman D, Aronson AR: A One-Size-Fits-All Indexing Method Does Not Exist: Automatic Selection Based on Meta-Learning. J Comput Sci Eng 2012, 6(2):151-60.
  • [24]Névéol A, Shooshan S: A recent advance in the automatic indexing of the biomedical literature. J Biomed Informatics 2009, 42(5):814-23.
  • [25]Yang Y: An evaluation of Statistical Approaches to Text Categorization. Inf retrieval 1999, 1(1-2):69-90.
  • [26]Trieschnigg D, Pezik P, Lee V, de Jong F, Kraaij W, Rebholz-Schuhmann D: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 2009, 25(11):1412-8.
  • [27]Cao Z, Qin T, Liu T, Tsai M, Li H. Learning to rank: from pairwise approach to listwise approach. Proceedings of the 24th international conference on Machine learning. New York: ACM. p. 129–36.
  • [28]Harispe S, Ranwez S, Janaqi S, Montmain J: The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 2014, 30(5):740-2.
  • [29]Neveol A, Zeng K, Bodenreider O: Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE. American Medical Informatics Association, Washington DC; 2006.
  • [30]Lin D: An information-theoretic definition of similarity. ICML, Madison, Wisconsin, USA; 1998.
  • [31]Resnik P: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J Artif Intelligence Res 1999, 11:95-130.
  • [32]Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J: A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. Journal of Biomedical Informatics 2014, 48:38-53.
  • [33]Seco N, Veale T, Hayes J: An intrinsic information content metric for semantic similarity in WordNet. ECAI, Valencia, Spain; 2004.
  • [34]Schlicker A, Domingues FS, Rahnenführer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC bioinf 2006, 7(1):302. BioMed Central Full Text
  文献评价指标  
  下载次数:19次 浏览次数:23次