期刊论文详细信息
Journal of Biomedical Semantics
Representing annotation compositionality and provenance for the Semantic Web
Karin Verspoor1  Lawrence E Hunter2  Michael Bada2  Kevin M Livingston2 
[1] Department of Computing and Information Systems, The University of Melbourne, Melbourne 3010 VIC, Australia;Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
关键词: RDF;    OWL;    Provenance;    Markup;    Annotation;    Conceptual data modeling;    Ontology;   
Others  :  1172368
DOI  :  10.1186/2041-1480-4-38
 received in 2013-04-12, accepted in 2013-09-20,  发布年份 2013
PDF
【 摘 要 】

Background

Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations.

Results

We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences.

Conclusions

With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.

【 授权许可】

   
2013 Livingston et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150421094440877.pdf 2469KB PDF download
Figure 4. 71KB Image download
Figure 3. 88KB Image download
Figure 2. 66KB Image download
Figure 1. 82KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Huang DW, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37(1):1-13.
  • [2]Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson SJ, Rindflesch TC, Wilbur WJ: The NLM indexing initiative. In Proceedings American Medical Informatics Association 2000 Annual Symposium. American Medical Informatics Association; 2000:17-21.
  • [3]Hunter J, Cole T, Sanderson R, Van de Sompel H: The open annotation collaboration: a data model to support sharing and interoperability of scholarly annotations. In Digital Humanities 2010: Conference Abstracts. Digital Humanities 2010. Edited by Pierazzo E. London, United Kingdom; 2010. (175-178).?7-10 July 2010
  • [4]Ciccarese P, Ocana M, Garcia Castro L, Das S, Clark T: An open annotation ontology for science on web 3.0. Journal of Biomedical Semantics 2011, 2(Suppl 2):S4. BioMed Central Full Text
  • [5]Sahoo SS, Sheth A, Henson C: Semantic provenance for eScience: managing the deluge of scientific data. Internet Computing, IEEE 2008, 12(4):46-54.
  • [6]Zhao J, Sahoo SS, Missier P, Sheth A, Goble C: Extending semantic provenance into the web of data. Internet Computing, IEEE 2011, 15(1):40-48.
  • [7]Gessler DDG, Joslyn C, Verspoor K: A posteriori ontology engineering for data-driven science. In Data Intensive Science. Edited by Critchlow T, Dam KK. Boca Raton, FL: Taylor Francis CRC Press; 2013.
  • [8]Bada M, Livingston K, Hunter L: From text to knowledge: toward systematic composition of complex representations. In Deep Knowledge Representation Challenge Workshop. Banff, Alberta, Canada: The Sixth International Conference on Knowledge Capture (K-CAP); 2011.
  • [9]Dublin Core Metadata Initiative. http://dublincore.org webcite
  • [10]Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25(11):1251-1255.
  • [11]Coulet A, Shah NH, Garten Y, Musen M, Altman RB: Using text to build semantic networks for pharmacogenomics. J Biomed Inform 2010, 43(6):1009-1019.
  • [12]Liu H, Komandur R, Verspoor K: From graphs to events: a subgraph matching approach for information extraction from biomedical text. In BioNLP Shared Task 2011 Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics; 2011:164-172.
  • [13]De Marneffe MC, Manning CD: The Stanford typed dependencies representation. In COLING’08 Workshop on CrossFramework and Cross-Domain Parser Evaluation. Stroudsburg, PA: Association for Computational Linguistics; 2008:1-8.
  • [14]Verspoor KM, Cohen KB: Natural Language Processing. In Encyclopedia of Systems Biology. Edited by Dubitsky W, Wolkenhauer O, Yokota H, Cho K-H. New York: Springer; 2013.
  • [15]Marcus M, Santorini B, Marcinkiewicz MA: Building a large annotated corpus of English: the Penn treebank. Computational Linguistics 1993, 19(2):313-330.
  • [16]Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Bretonnel Cohen K, Verspoor K, Blake JA, Hunter LE: Concept annotation in the CRAFT corpus. BMC Bioinformatics 2012, 13:161. BioMed Central Full Text
  • [17]Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 2012, 40(D1):D306-D312.
  • [18]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25(1):25.
  • [19]Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D’Eustachio P, Evsikov AV, Huang H: The protein ontology: a structured representation of protein forms and complexes. Nucleic Acids Res 2011, 39(suppl 1):D539.
  • [20]Carroll JJ, Bizer C, Hayes P, Stickler P: Named graphs, provenance and trust. In Internet Computing, IEEE (Volume:15, Issue: 1). ACM; 2011:40-48.
  • [21]Consortium TU: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 2012, 40(D1):D71-D75.
  • [22]Livingston KM, Johnson HL, Verspoor K, Hunter LE: Leveraging gene ontology annotations to improve a memory-based language understanding system. IEEE Internet Computing - INTERNET 2010, 40-45.
  • [23]Livingston KM, Bada M, Hunter L, Verspoor KM: An ontology of annotation content structure and provenance. Proc Intelligent Systems in Molecular Biology: Bio-ontologies SIG 2011.
  • [24]Livingston KM: Language Understanding by Reference Resolution in Episodic Memory. Evanston, IL: Northwestern University; 2009.
  • [25]Verspoor K, Bretonnel Cohen K, Lanfranchi A, Warner C, Johnson HL, Roeder C, Choi JD, Funk C, Malenkiy Y, Eckert M, Xue N, Baumgartner WA, Bada M, Palmer M, Hunter LE: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics 2012, 13:207. BioMed Central Full Text
  • [26]Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al.: The UCSC genome browser database: extensions and updates 2013. Nucleic Acids Res 2013, 41(D1):D64-D69.
  • [27]Sherry ST, Ward M-H, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29(1):308-311.
  • [28]Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA, Stratton MR: The Catalogue of Somatic Mutations in Cancer (COSMIC). In: Current Protocols in Human Genetics. John Wiley & Sons, Inc.; 2008.
  • [29]Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick’s Online Mendelian Inheritance in Man (OMIM®). Nucleic Acids Res 2009, 37(suppl 1):D793-D796.
  • [30]Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci 2009, 106(23):9362-9367.
  • [31]Chan J, Kishore R, Sternberg P, Van Auken K: The gene ontology: enhancements for 2011. Nucleic Acids Res 2012, 40(D1):D559-D564.
  • [32]Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST: The variant call format and VCFtools. Bioinformatics 2012, 27(15):2156-2158.
  • [33]Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical informatics 2008, 41(5):706-716.
  • [34]ISO: ISO 24612:2012 language resource management -- linguistic annotation framework (LAF). 2012. http://shop.bsigroup.com/ProductDetail/?pid=000000000030107266 webcite
  • [35]Ferrucci D, Lally A, Verspoor K, Nyberg E: Unstructured Information Management Architecture (UIMA) Version 1.0, technical standard. OASIS, Organization for the Advancement of Structured Information Standards 2009. http://docs.oasis-open.org/uima/v1.0/uima-v1.0.html webcite
  • [36]Ferrucci D, Lally A: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 2004, 10(3/4):327-348.
  • [37]Riesbeck CK, Martin C: Direct memory access parsing. Yale University Report 1985, 354:1-40.
  • [38]Riesbeck CK: From conceptual analyzer to direct memory access parsing: an overview. In Advances in Cognitive Science, Volume 1. Edited by Sharkey NE. Chichester, UK: Ellis Horwood; 1986:236-258.
  • [39]Hunter L, Lu Z, Firby JWAB Jr, Ogren PV, Cohen KB, Johnson HL: OpenDMAP: An open-source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-specific gene expression. BMC Bioinformatics 2008, 9(78):1-11.
  • [40]Phillips PC: Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 2008, 9(11):855-867.
  • [41]Wan X, Yang C, Yang Q, Zhao H, Yu W: The complete compositional epistasis detection in genome-wide association studies. BMC Genet 2013, 14(1):7. BioMed Central Full Text
  • [42]Goudey B, Rawlinson D, Wang Q, Shi F, Ferra H, Campbell R, Stern L, Inouye M, Ong CS, Kowalczyk A: GWIS - model-free, fast and exhaustive search for epistatic interactions in case–control GWAS. BMC Genomics 2013, 14(Suppl 3):S10. BioMed Central Full Text
  • [43]Mons B, Van Haagen H, Chichester C, Hoen PB, Den Dunnen JT, Van Ommen G, Van Mulligen E, Singh B, Hooft R, Roos M, et al.: The value of data. Nat Genet 2011, 43(4):281-283.
  • [44]Groth P, Gibson A, Velterop J: The anatomy of a nano-publication. Information Services and Use 2010, 30:51-56.
  • [45]Ding L, Finin T, Peng Y, Da Silva PP, Deborah L: Tracking RDF Graph Provenance using RDF Molecules, 2005, Proceedings of the Fourth International Semantic Web Conference. 2005.
  • [46]Hill D, Smith B, McAndrews-Hill M, Blake J: Gene ontology annotations: what they mean and where they come from. BMC Bioinformatics 2008, 9(Suppl 5):S2. BioMed Central Full Text
  • [47]Ide N, Suderman K: GrAF: A graph-based format for linguistic annotations. In Linguistic Annotation Workshop at ACL 2007. Prague: Association for Computational Linguistics; 2007.
  • [48]Cassidy S: Realisation of LAF in the DADA annotation server. In Fifth Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5). Hong Kong; 2010.
  • [49]Chiarcos C: POWLA: Modeling linguistic corpora in OWL/DL. In The Semantic Web: Research and Applications. Berlin, Germany: Springer; 2012:225-239.
  • [50]Verspoor KM, Livingston KM: Towards adaptation of linguistic annotation to scholarly annotation formalisms on the semantic web. In The Sixth Linguistic Annotation Worshop (LAW VI). Jeju, Republic of Korea: Association for Computational Linguistics; 2012.
  • [51]Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M: The sequence ontology: a tool for the unification of genome annotations. Genome biology 2005, 6(5):R44. BioMed Central Full Text
  • [52]Sahoo SS, Bodenreider O, Hitzler P, Sheth A, Thirunarayan K: Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data. In Scientific and Statistical Database Management. Berlin, Germany: Springer-Verlag; 2010:461-470.
  • [53]Flouris G, Fundulaki I, Pediaditis P, Theoharis Y, Christophides V: Coloring RDF triples to capture provenance. In The Semantic Web-ISWC 2009. Berlin, Germany: Springer-Verlag; 2009:196-212.
  • [54]Patrinos GP, Cooper DN, Van Mulligen E, Gkantouna V, Tzimas G, Tatum Z, Schultes E, Roos M, Mons B: Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Hum Mutat 2012, 33(11):1503-1512.
  • [55]Da Silva PP, McGuinness DL, Fikes R: A proof markup language for semantic web services. Inf Syst 2006, 31(4–5):381-395.
  • [56]Hartig O: Provenance information in the web of data. In Linked Data on the Web Workshop (LDOW). Madrid, Spain; 2009.
  • [57]Moreau L, Foster I (Eds): Provenance and Annotation of Data -International Provenance and Annotation Workshop, IPAW 2006, volume 4145 of Lecture Notes in Computer Science, May. Berlin, Germany: Springer-Verlag; 2006.
  • [58]Groth P: First OPM workshop minutes. Information Science Institute, USC (July 2008) 2008. http://twiki.ipaw.info/bin/view/Challenge/FirstOPMWorkshopMinutes webcite
  • [59]Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers j: The open provenance model core specification (v1. 1). Futur Gener Comput Syst 2011, 27(6):743-756.
  • [60]PROV. http://www.w3.org/2011/prov/wiki/Main_Page webcite
  • [61]PROV-O. http://www.w3.org/TR/2012/WD-prov-o-20120503/ webcite
  文献评价指标  
  下载次数:8次 浏览次数:17次