期刊论文详细信息
Journal of Biomedical Semantics
Automatically exposing OpenLifeData via SADI semantic Web Services
Mark D Wilkinson4  Michel Dumontier1  Mikel Egaña Aranguren2  Adrian Garcia4  José Cruz-Toledo3  Alison Callahan1  Alejandro Rodríguez González4 
[1] Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA;Genomic Resources Group, University of the Basque Country (UPV-EHU), Bilbao, Spain;Department of Biology, Carleton University, Ottawa, ON, Canada;Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, Spain
关键词: Galaxy;    Sentient knowledge explorer;    SHARE;    SPARQL;    Semantic web services;    SADI;    Bio2RDF;    OpenLifeData;   
Others  :  1133517
DOI  :  10.1186/2041-1480-5-46
 received in 2014-07-17, accepted in 2014-11-07,  发布年份 2014
【 摘 要 】

Background

Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location.

Results

We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries.

Conclusions

We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.

【 授权许可】

   
2014 González et al.; licensee BioMed Central Ltd.

附件列表
Files Size Format View
Figure 3. 87KB Image download
Figure 2. 118KB Image download
Figure 1. 79KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Goble C, Stevens R: State of the nation in data integration for bioinformatics. J Biomed Inform 2008, 41:687-693.
  • [2]Sheth A: Changing Focus on Interoperability in Information Systems:From System, Syntax, Structure to Semantics. In Interoperating Geogr. Inf. Syst. SE - 2, Vol. 495. Edited by Goodchild M, Egenhofer M, Fegeas R, Kottman C. Springer US; 1999:5-29. http://link.springer.com/chapter/10.1007%2F978-1-4615-5189-8_2# webcite
  • [3]Ashburner M: When names are less than crystal clear. Nature 1998, 394:216.
  • [4]Semantic Web[http://www.w3.org/standards/semanticweb/ webcite]
  • [5]RDF: Semantic Web Standards. [http://www.w3.org/RDF/ webcite]
  • [6]SPARQL query language for RDF[http://www.w3.org/TR/rdf-sparql-query/ webcite]
  • [7]Linked Data - Design Issues[http://www.w3.org/DesignIssues/LinkedData.html webcite]
  • [8]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25:25-29.
  • [9]OWL 2 Web Ontology Language Document Overview (Second Edition)[http://www.w3.org/TR/owl2-overview/ webcite]
  • [10]Callahan A, Cruz-Toledo J, Ansell P, Dumontier M: Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. The Semantic Web: Semantics and Big Data Lecture Notes in Computer Science 2013,Volume 7882 2013, 200-212.
  • [11]Belleau B, Nolin MA, Tourigny N, Rigault P, Morissette J: Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008, 41(5):706-716.
  • [12]Callahan A, Cruz-Toledo J, Dumontier M: Ontology-based querying with Bio2RDF’s linked open data. J Biomed Semantics 2013, 4:S1. BioMed Central Full Text
  • [13]Wilkinson MD, Vandervalk B, McCarthy L: The Semantic Automated Discovery and Integration (SADI) web service design-pattern, API and reference implementation. J Biomed Semantics 2011, 2:8. BioMed Central Full Text
  • [14]Vandervalk BP, McCarthy EL, Wilkinson MD: SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web. Third Asian Semant. Web Conf. ASWC 2008, Work. Proc. 2008, 69-78.
  • [15]Vandervalk B, McCarthy L, Wilkinson M: SHARE: A Semantic Web Query Engine for Bioinformatics. Semant. Web, Lect. Notes Comput. Sci. Proc. ASWC, Volume 5926. 2009, 367-369.
  • [16]The myGrid-Moby Ontology[http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription webcite]
  • [17]Bio2RDF Dataset Metrics[https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-dataset-metrics webcite]
  • [18]Apache Jena[https://jena.apache.org/ webcite]
  • [19]OWL API[http://owlapi.sourceforge.net/ webcite]
  • [20]Garijo D, Gil Y: A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data. WORKS '11 Proceedings of the 6th workshop on Workflows in support of large-scale science. 2011, 47-56.
  • [21]Garijo D, Gil Y: Towards Open Publication of Reusable Scientific Workflows: Abstractions, Standards, and Linked Data. 2012. Accessible from: http://www.isi.edu/~gil/papers/garijo-gil-opmw12.pdf webcite
  • [22]Groth P, Gibson A, Velterop J: The anatomy of a nanopublication. Inf Serv Use 2010, 30:51-56.
  • [23]Virtuoso SPARQL query form[http://sadiframework.org/registry/sparql webcite]
  • [24]IO-Informatics: Download: Sentient Knowledge Explorer - Personal Edition[http://www.io-informatics.com/download_KE_PersEd_B.html webcite]
  • [25]Wilkinson MD, McCarthy L: The SADI plug-in to IO informatics’ sentient knowledge explorer. Proc. 4th Int. Work. Semant. Web Appl. Tools Life Sci 2011, 116-118.
  • [26]Goecks J, Nekrutenko A, Taylor J, The Galaxy Team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11(8):R86. BioMed Central Full Text
  • [27]First Release[https://zenodo.org/record/10181 webcite]
  • [28]Egaña Aranguren M, Rodríguez González A, Wilkinson MD: Executing SADI services in Galaxy. J Biomed Semantics 2014, 5:42. BioMed Central Full Text
  • [29]Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E, Stoler N, Taylor J, Nekrutenko A, the Galaxy Team: Dissemination of scientific software with Galaxy ToolShed. Genome Biol 2014, 15:403. BioMed Central Full Text
  • [30]The Wilkinson Lab Galaxy Server[http://biordf.org:8983 webcite]
  • [31]OpenLifeData2SADI workflow[http://biordf.org:8983/u/mikel-egana-aranguren/w/openlifedata2sadi-jbms-named webcite]
  • [32]OpenLifeData2SADI history[http://biordf.org:8983/u/mikel-egana-aranguren/h/openlifedata2sadi-jbms-named webcite]
  • [33]Withers D, Kawas E, McCarthy L, Vandervalk B, Wilkinson MD: Semantically-Guided Workflow Construction in Taverna: The SADI and BioMoby Plug-Ins. Leveraging Applications of Formal Methods, Verification, and Validation 4th International Symposium on Leveraging Applications, Lecture Notes in Computer Science 2010, 301-312.
  • [34]bio2rdf sending queries to beta.sparql.uniprot.org that won't return results - Google Groups [http://goo.gl/3sI4df webcite]
  文献评价指标  
  下载次数:46次 浏览次数:12次