| Journal of Biomedical Semantics | |
| An effective method of large scale ontology matching | |
| Gayo Diallo1  | |
| [1] University Bordeaux, ISPED, Centre INSERM U897, F-33000 Bordeaux, France | |
| 关键词: Semantic interoperability; Machine learning; Information retrieval; Entity similarity; Life sciences ontologies; Ontology matching; | |
| Others : 1133524 DOI : 10.1186/2041-1480-5-44 |
|
| received in 2013-05-23, accepted in 2014-09-12, 发布年份 2014 | |
【 摘 要 】
Background
We are currently facing a proliferation of heterogeneous biomedical data sources accessible through various knowledge-based applications. These data are annotated by increasingly extensive and widely disseminated knowledge organisation systems ranging from simple terminologies and structured vocabularies to formal ontologies. In order to solve the interoperability issue, which arises due to the heterogeneity of these ontologies, an alignment task is usually performed. However, while significant effort has been made to provide tools that automatically align small ontologies containing hundreds or thousands of entities, little attention has been paid to the matching of large sized ontologies in the life sciences domain.
Results
We have designed and implemented ServOMap, an effective method for large scale ontology matching. It is a fast and efficient high precision system able to perform matching of input ontologies containing hundreds of thousands of entities. The system, which was included in the 2012 and 2013 editions of the Ontology Alignment Evaluation Initiative campaign, performed very well. It was ranked among the top systems for the large ontologies matching.
Conclusions
We proposed an approach for large scale ontology matching relying on Information Retrieval (IR) techniques and the combination of lexical and machine learning contextual similarity computing for the generation of candidate mappings. It is particularly adapted to the life sciences domain as many of the ontologies in this domain benefit from synonym terms taken from the Unified Medical Language System and that can be used by our IR strategy. The ServOMap system we implemented is able to deal with hundreds of thousands entities with an efficient computation time.
【 授权许可】
2014 Diallo; licensee BioMed Central Ltd.
| Files | Size | Format | View |
|---|---|---|---|
| Figure 7. | 126KB | Image | |
| Figure 6. | 100KB | Image | |
| Figure 5. | 99KB | Image | |
| Figure 4. | 43KB | Image | |
| Figure 3. | 92KB | Image | |
| Figure 2. | 58KB | Image | |
| Figure 1. | 63KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
【 参考文献 】
- [1]Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D: Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semant 2014, 5(1):15. BioMed Central Full Text
- [2]Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey MA, Chute CG, Musen MA: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 2009, 37(Web Server):W170-W173.
- [3]He Y, Sarntivijai S, Lin Y, Xiang Z, Guo A, Zhang S, Jagannathan D, Toldo L, Tao C, Smith B: OAE: the ontology of adverse events. J Biomed Semant 2014, 5(1):29. BioMed Central Full Text
- [4]Diallo G, Kostkova P, Jawaheer G, Jupp S, Stevens R: Process of building a vocabulary for the infection domain. In Proceeding of the 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS'08). Jyvaskyla, Finland; 2008:308-313.
- [5]Pham MH, Bernhard D, Diallo G, Messai R, Simonet M: SOM-based clustering of multilingual documents using an ontology. In Data Mining with Ontologies: Implementations, Findings, and Frameworks. Edited by Nigro HO, Císaro SG, Xodo D. Idea Group Inc; 2007:65-82.
- [6]Rector AL, Qamar R, Marley T: Binding Ontologies & Coding Systems to Electronic Health Records and Messages. Baltimore, Maryland, USA: Proceedings of the Second International Workshop on Formal Biomedical Knowledge Representation: "Biomedical Ontology in Action" (KR-MED’2006), Collocated with the 4th International Conference on Formal Ontology in Information Systems (FOIS-2006); 2006.
- [7]Schulz S, Cornet R, Spackman K: Consolidating SNOMED CT’s ontological commitment. Appl Ontol 2011, 6(1):1-11.
- [8]Oliveira JL, Lopes P, Nunes T, Campos D, Boyer S, Ahlberg E, Van Mulligen EM, Kors JA, Singh B, Furlong LI, Sanz F, Bauer-Mehren A, Carrascosa MC, Mestres J, Avillach P, Diallo G, Díaz Acedo C, Van der Lei J: The EU-ADR web platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf 2013, 22(5):459-467.
- [9]Halevy A, Rajaraman A, Ordille J: Data Integration: The Teenage Years. Seoul, Korea: VLDB Endowment; 2006:9-16. [Proceedings of the 32Nd International Conference on Very Large Data Bases] Available from: http://dl.acm.org/citation.cfm?id=1182635.1164130 webcite
- [10]Diallo G, Khelif K, Corby O, Kostkova P, Madle G: Semantic browsing of a domain specific resources: the corese-neli framework. Volume 3. Sydney, Australia: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology; 2008::50-54. [Web Intelligence/IAT Workshops]
- [11]Hao Y, Zhang Y: Web Services Discovery Based on Schema Matching. Volume 62. Darlinghurst, Australia: Australia: Australian Computer Society, Inc; 2007::107-113. [Proceedings of the Thirtieth Australasian Conference on Computer Science] Available from: http://dl.acm.org/citation.cfm?id=1273749.1273762 webcite
- [12]Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, dos Santos CT: Ontology alignment evaluation initiative: six years of experience. J Data Semant 2011, 15:158-192.
- [13]Shvaiko P, Euzenat J: Ten challenges for Ontology Matching. In On the Move to Meaningful Internet Systems: OTM 2008. Edited by Meersman R, Tari Z. Heidelberg: Springer Berlin; 2008:1164-1182. Available from: http://dx.doi.org/10.1007/978-3-540-88873-4_18 webcite
- [14]Shvaiko P, Euzenat J: Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 2013, 25(1):158-176.
- [15]Kalfoglou Y, Schorlemmer M: Ontology mapping: the state of the art. Knowl Eng Rev 2003, 18(1):1-31.
- [16]Rahm E, Bernstein PA: A survey of approaches to automatic schema matching. VLDB J 2001, 10(4):334-350.
- [17]Rahm E: Towards large-scale schema and ontology matching. In Schema Matching and Mapping. Data-Centric Systems and Applications; 2011:3-27.
- [18]Algergawy A, Massmann S, Rahm E: A Clustering-Based Approach for Large-Scale Ontology Matching. Volume 6909. ADBIS; Lecture Notes in Computer Science; 2011::415-428.
- [19]Aumueller D, Do HH, Massmann S, Rahm E: Schema and ontology matching with COMA++. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD’05). Baltimore, MD, USA; 2005:906-908.
- [20]Hamdi F, Safar B, Niraula NB, Reynaud C: TaxoMap alignment and refinement modules: results for OAEI 2010. In Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) Collocated with the 9th International Semantic Web Conference (ISWC-2010). Shanghai, China: CEUR-WS; 2010:212-220.
- [21]Hu W, Zhao Y, Qu Y: Partition-Based Block Matching of Large Class Hierarchies. Berlin, Heidelberg: Springer; 2006:72-83. [Proceedings of the First Asian Conference on The Semantic Web [Internet]] Available from: http://dx.doi.org/10.1007/11836025_8 webcite
- [22]Wang P, Zhou Y, Xu B: Matching Large Ontologies Based on Reduction Anchors. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three [Internet]. Barcelona, Catalonia, Spain: AAAI Press; 2011:2343-2348. Available from: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-390 webcite
- [23]Lambrix P, Liu Q: Using Partial Reference Alignments to Align Ontologies. In ESWC 2009. LNCS. Volume 5554. Edited by Aroyo L, Traverso P. Heidelberg: Springer; 2009::188-202.
- [24]Lambrix P, Tan H: Sambo - a system for aligning and merging biomedical ontologies. J Web Semant 2006, 4:206.
- [25]Nezhadi A, Shadgar B, Osareh A: Ontology alignment using machine learning techniques. Int J Comput Sci Inf Technol 2011, 3(2):139-150.
- [26]Ichise R: Machine Learning Approach for Ontology Mapping Using Multiple Concept Similarity Measures. Portland, Oregon: Seventh IEEE/ACIS International Conference on Computer and Information Science, (ICIS 08); 2008:340-346.
- [27]Doan A, Madhavan J, Domingos P, Halevy A: Ontology matching: a machine learning approach. In Handbook on Ontologies in Information Systems. Edited by Staab S, Studer R. Berlin Heidelberg: Springer; 2004:385-403.
- [28]Lambrix P, Kaliyaperumal R: A session-based approach for aligning large ontologies. In The Semantic Web: Semantics and Big Data [Internet]. Edited by Cimiano P, Corcho O, Presutti V, Hollink L, Rudolph S. Heidelberg: Springer Berlin; 2013:46-60. Available from: http://dx.doi.org/10.1007/978-3-642-38288-8_4 webcite
- [29]Ruiz EJ, Grau BC, Zhou Y, Horrocks I: Large-scale Interactive Ontology Matching: Algorithms and Implementation. 242nd edition. Montpellier, France: Ios Press; 2012:444-449. [Proceedings of the 20th European Conference on Artificial Intelligence (ECAI)]
- [30]Kirsten T, Gross A, Hartung M, Rahm E: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J Biomed Semant 2011, 2(1):6. BioMed Central Full Text
- [31]Rosse C, Mejino JLV Jr: A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inform 2003, 36(6):478-500.
- [32]Golbeck J, Fragoso G, Hartel F, Hendler J, Oberthaler J, Parsia B: National cancer institute’s thesaurus and ontology. J Web Semant 2003, 1:2003.
- [33]Ngo D, Bellahsene Z: YAM++: a multi-strategy based approach for ontology matching task. In Proceedings of 18th International Conference, EKAW 2012, Galway City, Ireland. 7603 edition. Edited by Ten Teije A, Völker J, Handschuh S, Stuckenschmidt H, D’ Aquin M, Nikolov A, Aussenac-Gilles N, Hernandez N. Springer LNCS; 2012:421-425.
- [34]Bodenreider O: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004, 32(Database-Issue):267-270.
- [35]Diallo G, Kammoun A: Towards Learning Based Strategy for Improving the Recall of the ServOMap Matching System. Volume 1114. Edinburgh, UK: CEUR Workshop Proceeding of SWAT4LS; 2013.
- [36]Diallo G, Ba M: Effective method for large scale ontology matching. In CEUR Workshop Proceeding of SWAT4LS. Volume 952. Paris, France; 2012.
- [37]Ba M, Diallo G: Large-scale biomedical ontology matching with ServOMap. IRBM 2013, 34(1):56-59.
- [38]Lassila O, Swick RR: Resource Description Framework (RDF) Model and Syntax Specification. Working draft, World Wide Consortium; 1998.
- [39]McGuinness DL, Van Harmelen F: OWL web ontology language overview. W3C Recomm 2004, 10(2004–03):10.
- [40]Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg L, Eilbeck K, Ireland A, Mungall C, Consortium OBI, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SS, Scheuermann R, Shah N, Whetzel P, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25(11):1251-1255.
- [41]Miles A, Matthews B, Wilson M, Brickley D: SKOS Core: Simple Knowledge Organisation for the Web. In Proceedings of International Conference on Dublin Core and Metadata Applications. Madrid, Spain; 2005:3-10.
- [42]Diallo G: Efficient building of local repository of distributed ontologies. In IEEE Proceeding of SITIS. Dijon, France; 2011:159-166.
- [43]Diallo G: Towards decentralized and cooperative repositories of distributed ontologies. In ACM Proceeding of SWAT4LS '11 4th Int. Workshop on Semantic Web Applications and Tools for Life Sciences. London, United Kingdom; 2011:8-9.
- [44]McCandless M, Hatcher E, Gospodnetic O: Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Greenwich, CT, USA: Manning Publications Co.; 2010.
- [45]Salton G, Wong A, Yang CS: A vector space model for automatic indexing. Commun ACM 1975, 18(11):613-620.
- [46]Baeza-Yates RA, Ribeiro-Neto B: Modern Information Retrieval. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.; 1999.
- [47]Maedche A, Staab S: Ontology learning for the semantic web. IEEE Intell Syst 2001, 16(2):72-79.
- [48]Dramé K, Diallo G, Delva F, Dartigues JF, Mouillet E, Salamon R, Mougin F: Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: an application to Alzheimer’s disease. J Biomed Inform 2014, 48:171-182.
- [49]Qu Y, Hu W, Cheng G: Constructing Virtual Documents for Ontology Matching. New York, NY, USA: ACM; 2006:23-31. [Proceedings of the 15th International Conference on World Wide Web] Available from: http://doi.acm.org/10.1145/1135777.1135786 webcite
- [50]Stoilos G, Stamou G, Kollias S, Gil YE Motta VR, Benjamins MA: A string metric for ontology alignment. In Proceedings of the International Semantic Web Conference (ISWC 05). Galway, Ireland: Musen, Springer-Verlag; 2005:624-637.
- [51]Ukkonen E: Approximate string-matching with Q-grams and maximal matches. Theor Comput Sci 1992, 92(1):191-211.
- [52]Levenshtein V: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 1966, 10(8):707.
- [53]Miller GA: WordNet: a lexical database for English. Commun ACM 1995, 38:39-41.
- [54]Jaccard P: The distribution of the flora in the alpine zone. New Phytol 1912, 11(2):37-50.
- [55]Monge AE, Elkan CP: The field matching problem: algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland, Oregon, USA; 1996:267-270.
- [56]Quinlan JR: Induction of decision trees. Mach Learn 1986, 1(1):81-106.
- [57]Jiménez-Ruiz E, Meilicke C, Grau BC, Horrocks I: Evaluating mapping repair systems with large biomedical ontologies. In Proceedings of the 26th International Workshop on Description Logics (DL 2013). Ulm, Germany; 2013:246-257.
- [58]Esteban-Gutiérrez M, Garcıa-Castro R, Gómez-Pérez A, Esteban-Gutiérrez M, Garcıa-Castro R, Gómez-Pérez A: Executing Evaluations over Semantic. Volume 666. Shanghai, China: CEUR; 2010. [Technologies using the SEALS Platform. IWEST 2010]
- [59]Aguirre J-L, Eckert K, Euzenat J, Ferrara A, Van Hage WR, Hollink L, Jiminez-Ruiz E, Meilicke C, Nikolov A, Ritze D, Scharffe F, Shvaiko P, Svab-Zamazal O, Trojahn C, Zapilko B: Results of the Ontology Alignment Evaluation Initiative 2012. Boston, USA: Proceedings of 7th ISWC workshop on ontology matching (OM); 2012.
- [60]Cuenca Grau B, Dragisic Z, Eckert K, Euzenat J, Ferrara A, Granada R, Ivanova V, Jiménez-Ruiz R, Oskar Kempf A, Lambrix P, Nikolov A, Paulheim H, Ritze D, Scharffe F, Shvaiko P, Trojahn C, Zamazal O: Results of the ontology alignment evaluation initiative 2013. In Proc 8th ISWC Workshop on Ontology Matching (OM). Sydney, Australia; 2013:61-100. Available from: http://hal.inria.fr/hal-00918494 webcite
- [61]Cheatham M, Hitzler P: String similarity metrics for ontology alignment. In The Semantic Web – ISWC 2013. Part I. Volume 8218. Edited by LNCS, Alani H, Kagal L, Fokoue A, Groth P, Biemann C, Parreira J, Aroyo L, Noy N, Welty C, Janowicz K. Berlin Heidelberg: Springer; 2013::294-309.
- [62]Zhang S, Mork P, Bodenreider O, Bernstein PA: Comparing two approaches for aligning representations of anatomy. Artif Intell Med 2007, 39(3):227-236.
- [63]Fung KW, Bodenreider O, Aronson AR, Hole WT, Srinivasan S: Combining lexical and semantic methods of inter-terminology mapping using the UMLS. Stud Health Technol Inform 2007, 129(Pt 1):605-609.
- [64]Taboada M, Lalin R, Martinez D: An automated approach to mapping external terminologies to the UMLS. IEEE Trans Biomed Eng 2009, 56(6):1598-1605.
- [65]Zhou L, Plasek JM, Mahoney LM, Chang FY, DiMaggio D, Rocha RA: Mapping partners master drug dictionary to RxNorm using an NLP-based approach. J Biomed Inform 2012, 45(4):626-633.
- [66]Zhou L, Plasek JM, Mahoney LM, Karipineni N, Chang F, Yan X, Chang F, Dimaggio D, Goldman D, Rocha R: Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes. AMIA Annu Symp Proc AMIA Symp AMIA Symp 2011, 2011:1639-1648.
- [67]Ngo D, Bellahsene Z: YAM++ results for OAEI 2013. CEUR Workshop Proceedings of OM’2013 2013, 1111:211-218.
- [68]Pirrò G, Talia D: LOM: a linguistic ontology matcher based on information retrieval. J Inf Sci 2008, 34(6):845-860.
- [69]Meilicke C: Alignment Incoherence in Ontology Matching. Ph.D. Thesis [Chair of Artificial Intelligence]. Germany: University of Mannheim; 2011.
- [70]Pesquita C, Faria D, Santos E, Couto FM: To repair or not to repair: reconciling correctness and coherence in ontology reference alignments. In Proc 8th ISWC Workshop on Ontology Matching (OM). Sydney, Australia; 2013:13-24.
- [71]Meilicke C, Stuckenschmidt H, Sváb-Zamazal O: A Reasoning-Based Support Tool for Ontology Mapping Evaluation. In ESWC 2009. LNCS. Volume 5554. Edited by Aroyo L, Traverso P. Heidelberg: Springer; 2009::878-882.