| Journal of Biomedical Semantics | |
| A rule-based ontological framework for the classification of molecules | |
| Ian Horrocks1  Markus Krötzsch2  Despoina Magka1  | |
| [1] Department of Computer Science, University of Oxford, Oxford, UK;Department of Computer Science, Technical University of Dresden, Dresden, Germany | |
| 关键词: Cheminformatics; Datalog extensions; Logic programming and answer set programming; Knowledge representation and reasoning; Semantic technologies; | |
| Others : 804613 DOI : 10.1186/2041-1480-5-17 |
|
| received in 2013-05-07, accepted in 2014-01-15, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
A variety of key activities within life sciences research involves integrating and intelligently managing large amounts of biochemical information. Semantic technologies provide an intuitive way to organise and sift through these rapidly growing datasets via the design and maintenance of ontology-supported knowledge bases. To this end, OWL—a W3C standard declarative language— has been extensively used in the deployment of biochemical ontologies that can be conveniently organised using the classification facilities of OWL-based tools. One of the most established ontologies for the chemical domain is ChEBI, an open-access dictionary of molecular entities that supplies high quality annotation and taxonomical information for biologically relevant compounds. However, ChEBI is being manually expanded which hinders its potential to grow due to the limited availability of human resources.
Results
In this work, we describe a prototype that performs automatic classification of chemical compounds. The software we present implements a sound and complete reasoning procedure of a formalism that extends datalog and builds upon an off-the-shelf deductive database system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. Furthermore, we describe a surface ‘less-logician-like’ syntax that allows application experts to create ontological descriptions of complex biochemical objects without prior knowledge of logic. In terms of performance, a noticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language suitable for the life sciences domain that exhibits a favourable balance between expressive power and practical feasibility.
Conclusions
Our proposed methodology can form the basis of an ontology-mediated application to assist biocurators in the production of complete and error-free taxonomies. Moreover, such a tool could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences and biomedical knowledge.
【 授权许可】
2014 Magka et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140708063547354.pdf | 1307KB | ||
| Figure 6. | 23KB | Image | |
| Figure 5. | 50KB | Image | |
| Figure 4. | 33KB | Image | |
| Figure 3. | 28KB | Image | |
| Figure 2. | 43KB | Image | |
| Figure 1. | 71KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Wolstencroft K, Lord PW, Tabernero L, Brass A, Stevens R: Protein classification using ontology classification. In ISMB (Supplment of Bioinformatics). Oxford University Press; 2006:530-538. http://bioinformatics.oxfordjournals.org/content/22/14/e530 webcite
- [2]Chepelev L, Dumontier M: Chemical entity semantic specification knowledge representation for efficient semantic cheminformatics and facile data integration. J Cheminformatics 2011., 3(20)
- [3]Chepelev L, Dumontier M: Semantic Web integration of Cheminformatics resources with the SADI framework. J Cheminformatics 2011., 3(16)
- [4]Horrocks I, Patel-Schneider PF, van Harmelen F: From SHIQ and RDF to OWL: the making of a web ontology language. J Web Sem 2003, 1:7-26.
- [5]Chan J, Kishore R, Sternberg P, Van Auken K: The gene ontology enhancements for 2011. Nucleic Acids Res 2012, 40(D1):D559-D564.
- [6]Hastings J, de Matos P, Dekker A, Ennis M, Harsha B, Kale N, Muthukrishnan V, Owen G, Turner S, Williams M, Steinbeck C: The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res 2013, 41(Database-Issue):456-463.
- [7]Wolstencroft K, Brass A, Horrocks I, Lord PW, Sattler U, Turi D, Stevens R: A little semantic web goes a long way in biology. In ISWC. Springer; 2005. http://link.springer.com/chapter/10.1007%2F11574620_56 webcite
- [8]Chepelev LL, Riazanov A, Kouznetsov A, Low HS, Dumontier M, Baker CJO: Prototype semantic infrastructure for automated small molecule Classification and Annotation in Lipidomics. BMC Bioinformatics 2011, 12:303. BioMed Central Full Text
- [9]Magka D, Motik B, Horrocks I: Modelling structured domains using description graphs and logic programming. In ESWC, Volume 7295 of Lecture Notes in Computer Science. Edited by Simperl E, Cimiano P, Polleres A, Corcho Ó, Presutti V. Springer; 2012:330-344.
- [10]Mungall C: Experiences using logic programming in bioinformatics. In ICLP. Springer; 2009:1-21. [Keynote talk]. http://link.springer.com/chapter/10.1007%2F978-3-642-02846-5_1 webcite
- [11]Vardi MY: Why is modal logic so robustly decidable? In Descriptive Complexity and Finite Models DIMACS Workshop. American Mathematical Society; 1996:149-184.
- [12]Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, Chelliah V, Li L, He E, Henry A, Stefan MI, et al.: BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Syst Biol 2010, 4:92. BioMed Central Full Text
- [13]Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 2011, 39(Database-Issue):691-697.
- [14]Hoehndorf R, Dumontier M, Gkoutos GV: Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics 2012, 28(16):2169-2175.
- [15]Ferreira JD, Couto FM: Semantic similarity for automatic classification of chemical compounds. PLoS Comput Biol 2010, 6(9):e1000937.
- [16]The database and ontology of chemical entities of biological interest [http://www.ebi.ac.uk/chebi/ webcite]
- [17]Bolton EE, Wang Y, Thiessen PA, Bryant SH: PubChem: integrated platform of small molecules and biological activities. Ann Reports in Comput Chem 2008, 4:217-241.
- [18]Wegner JK, Sterling A, Guha R, Bender A, Faulon JL, Hastings J, O’Boyle NM, Overington JP, van Vlijmen H, Willighagen EL: Cheminformatics. Commun ACM 2012, 55(11):65-75.
- [19]Villanueva-Rosales N, Dumontier M: Describing chemical functional groups in OWL-DL for the classification of chemical compounds. OWLED CEUR-WS.org 2007. http://ceur-ws.org/Vol-258/paper28.pdf webcite
- [20]Konyk M, Battista ADL, Dumontier M: Chemical knowledge for the semantic web. In DILS. Evry, France: Springer; 2008:169-176.
- [21]Hastings J, Dumontier M, Hull D, Horridge M, Steinbeck C, Stevens R, Sattler U, Hörne T, Britz K: Representing chemicals using owl, description graphs and rules. In OWLED, Volume 614. CEUR-WS.org; 2010. http://ceur-ws.org/Vol-614/owled2010_submission_13.pdf webcite
- [22]Dumontier M: Molecular symmetry and specialization of atomic connectivity by class-based reasoning of chemical structure. In OWLED. CEUR-WS.org; 2012. http://ceur-ws.org/Vol-849/paper_33.pdf webcite
- [23]Hastings J, Magka D, Batchelor CR, Duan L, Stevens R, Ennis M, Steinbeck C: Structure-based classification and ontology in chemistry. J Cheminformatics 2012, 4:8. BioMed Central Full Text
- [24]King R, Muggleton S, Srinivasan A, Sternberg M: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences 1996, 93:438-442.
- [25]Deshpande M, Kuramochi M, Wale N, Karypis G: Frequent substructure-based approaches for classifying chemical compounds. IEEE TKDE 2005, 17(8):1036-1050.
- [26]Grego T, Pesquita C, Bastos HP, Couto FM: Chemical entity recognition and resolution to ChEBI. ISRN Bioinformatics 2012, 2012:Article ID 619427.
- [27]Bobach C, Böhme T, Laube U, Püschel A, Weber L: Automated compound classification using a chemical ontology. J Cheminformatics 2012, 4:40. BioMed Central Full Text
- [28]Sankar P, Aghila G: Design and development of chemical ontologies for reaction representation. J Chem Inform Modeling 2006, 46(6):2355-2368.
- [29]Feldman HJ, Dumontier M, Ling S, Haider N, Hogue CW: CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett 2005, 579(21):4685-4691.
- [30]Grau BC, Horrocks I, Krötzsch M, Kupke C, Magka D, Motik B, Wang Z: Acyclicity notions for existential rules and their application to query answering in ontologies. J Artif Intell Res (JAIR) 2013, 47:741-808.
- [31]Magka D: Foundations and applications of knowledge representation for Structured entities. PhD thesis. University of Oxford, 2013
- [32]Horridge M, Drummond N, Goodwin J, Rector AL, Stevens R, Wang H: The manchester OWL syntax. In OWLED, Volume 216 of CEUR Workshop Proceedings. Edited by Grau BC, Hitzler P, Shankey C, Wallace E. CEUR-WS.org; 2006. http://ceur-ws.org/Vol-216/submission_9.pdf webcite
- [33]Glimm B, Horridge M, Parsia B, Patel-Schneider PF: A syntax for rules in OWL 2. In OWLED, Volume 529 of CEUR Workshop Proceedings. Edited by Hoekstra R, Patel-Schneider PF. CEUR-WS.org; 2009. http://ceur-ws.org/Vol-529/owled2009_submission_16.pdf webcite
- [34]Tudose I, Hastings J, Muthukrishnan V, Owen G, Turner S, Dekker A, Kale N, Ennis M, Steinbeck C: OntoQuery: easy-to-use web-based OWL querying. Bioinformatics 2013, 29(22):2955-2957.
- [35]Magka D, Krötzsch M, Horrocks I: A syntax for representing structured entities. Tech. rep., University of Oxford 2013. [http://www.cs.ox.ac.uk/isg/people/despoina.magka/pubs/reports/MagkaKH-SS-13.pdf webcite]
- [36]LoPStER [https://github.com/magkades/lopster webcite]
- [37]Gelfond M, Lifschitz V: The stable model semantics for logic programming. In ICLP/SLP. MIT press; 1988:1070-1080.
- [38]Cuenca Grau B, Horrocks I, Krötzsch M, Kupke C, Magka D, Motik B, Wang Z: Acyclicity conditions and their application to query answering in description logics. In KR 2012. Rome, Italy: AAAI Press; 2012.
- [39]Leone N, Pfeifer G, Faber W, Eiter T, Gottlob G, Perri S, Scarcello F: The DLV system for knowledge representation and reasoning. ACM TOCL 2006, 7(3):499-562.
- [40]Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J: Descriion of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Information and Comput Sci 1992, 32(3):244-255.
- [41]Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL: Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des 2006, 12(17):2111-2120.
- [42]Motik B, Cuenca Grau B, Horrocks I, Sattler U: Representing ontologies using description logics, description graphs, and rules. Art Int 2009, 173(14):1275-1309.
- [43]Heller SR, McNaught AD: The IUPAC international chemical identifier (InChI). Chem Int 2009, 31:7.
- [44]McNaught AD, Wilkinson A: Compendium of Chemical Terminology, Volume 1669. Oxford, UK: Blackwell Science Oxford; 1997.
- [45]Pence HE, Williams A: ChemSpider: an online chemical information resource. J Chem Educ 2010, 87(11):1123-1124.
- [46]Boelling C, Dumontier M, Weidlich M, Holzhütter HG: Role-based representation and inference of biochemical processes. In ICBO. CEUR-WS.org; 2012. http://ceur-ws.org/Vol-897/session3-paper14.pdf webcite
- [47]Low H, Baker C, Garcia A, Wenk M: An OWL-DL ontology for classification of lipids. In ICBO. Nature precedings; 2009:3-3. http://precedings.nature.com/documents/3542/version/1 webcite
- [48]Sang LH: Knowledge representation and ontologies for lipids and lipidomics. Master’s Thesis 2009.
- [49]Magka D, Krötzsch M, Horrocks I: Computing stable models for nonmonotonic existential rules. In IJCAI. Edited by Rossi F. IJCAI/AAAI; 2013. http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6598 webcite
- [50]Krötzsch M, Magka D, Horrocks I: Concrete results on abstract rules. In LPNMR, Volume 8148 of Lecture Notes in Computer Science. Edited by Cabalar P, Son TC. Corunna, Spain: Springer; 2013:414-426.
- [51]Magka D, Kazakov Y, Horrocks I: Tractable extensions of the description logic, with numerical datatypes. J Autom Reasoning 2011, 47(4):427-450.
- [52]Protégé Ontology Editor [http://protege.stanford.edu webcite]
- [53]Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, Mäsak C, Torrance GM, Wagener J, Willighagen EL, Steinbeck C, Wikberg JES: Bioclipse 2: A scriptable integration platform for the life sciences. BMC Bioinf 2009, 10:397. BioMed Central Full Text
- [54]Jmol: an open-source Java viewer for chemical structures in 3D [http://www.jmol.org webcite]
- [55]Krause S, Willighagen EL, Steinbeck C: JChemPaint - using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 2000, 5(10):93-98.
- [56]Klyne G, Carroll JJ, McBride B: Resource description framework (RDF) concepts and abstract syntax. W3C Recommendation 2004,. 10. http://www.w3.org/TR/rdf-concepts/ webcite
PDF