期刊论文详细信息
GigaScience
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker
Mark D. Wilkinson1  Mikel Egaña Aranguren2 
[1] Biological Informatics, Centre for Plant Biotechnology and Genomics (CBGP), Technical University of Madrid (UPM), Campus of Montegancedo, Pozuelo de Alarcón 28223, Spain;Eurohelp Consulting, Maximo Aguirre 18, Bilbo, 48011, Spain
关键词: Reproducibility;    Docker;    Galaxy;    Workflow;    Web service;    SADI;    RDF;    Semantic Web;   
Others  :  1234665
DOI  :  10.1186/s13742-015-0092-3
 received in 2015-02-06, accepted in 2015-10-27,  发布年份 2015
PDF
【 摘 要 】

Background

Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services.

Findings

This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration.

Conclusions

The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.

【 授权许可】

   
2015 Aranguren and Wilkinson.

【 预 览 】
附件列表
Files Size Format View
20151204045602203.pdf 1636KB PDF download
Fig. 7. 42KB Image download
Fig. 6. 57KB Image download
Fig. 5. 7KB Image download
Fig. 4. 54KB Image download
Fig. 3. 10KB Image download
Fig. 2. 65KB Image download
Fig. 1. 6KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

【 参考文献 】
  • [1]W, 3C. Semantic Web. http://www.w3.org/standards/semanticweb/. Online; Accessed 5-February-2015.
  • [2]Good BM, Wilkinson MD. The Life Sciences Semantic Web is Full of Creeps! Brief Bioinform. 2006; 7(3):275-86.
  • [3]W, 3C. RDF current status. http://www.w3.org/standards/techs/rdf. Online; Accessed 5-February-2015.
  • [4]W, 3C. SPARQL current status. http://www.w3.org/standards/techs/sparql. Online; Accessed 5-February-2015.
  • [5]W, 3C. OWL Web Ontology Language current status. http://www.w3.org/standards/techs/owl. Online; Accessed 5-February-2015.
  • [6]Internet Engineering Task Force (IETF). Uniform Resource Identifier (URI): Generic Syntax. http://tools.ietf.org/html/rfc3986. Online; Accessed 5-February-2015.
  • [7]Tim Berners-Lee. Linked Data. http://www.w3.org/DesignIssues/LinkedData.html. Online; Accessed 5-February-2015.
  • [8]González AR, Callahan A, Toledo JC, García A, Aranguren ME, Dumontier M et al.. Automatically exposing OpenLifeData via SADI semantic Web Services. J Biomed Semant. 2014; 5(1):46. BioMed Central Full Text
  • [9]Aranguren ME, Breis JTF, Dumontier M. Special issue on Linked Data for Health Care and the Life Sciences. Semant Web J. 2014; 5(2):99-100.
  • [10]Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek B et al.. Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinformatics. 2009; 10(1):136. BioMed Central Full Text
  • [11]Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L et al.. The EBI RDF platform: linked open data for the life sciences. Bioinformatics. 2014; 30(9):1338-1339.
  • [12]Open Life Data. Open Life Data. http://openlifedata.org/. Online; Accessed 5-February-2015.
  • [13]Cyganiak R, Jentzsch A. The Linking Open Data cloud diagram. http://lod-cloud.net/. Online; Accessed 5-February-2015.
  • [14]Wilkinson M, Vandervalk B, McCarthy L. The Semantic Automated Discovery and Integration (SADI) web service Design-Pattern, API and Reference Implementation. J Biomed Semant. 2011; 2(1):8. BioMed Central Full Text
  • [15]Goecks J, Nekrutenko A, Taylor J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):86. BioMed Central Full Text
  • [16]Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. J Biomed Informatics. 2008; 41(5):706-16.
  • [17]Docker Inc. Docker - An open platform for distributed applications for developers and sysadmins. http://www.docker.com/. Online; Accessed 5-February-2015.
  • [18]Docker Inc.Docker Hub. http://hub.docker.com/. Online; Accessed 5-February-2015.
  • [19]Aranguren ME, González AR, Wilkinson MD. Executing SADI services in Galaxy. J Biomed Semant. 2014; 5(1):42. BioMed Central Full Text
  • [20]Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000; 28(1):27-30.
  • [21]Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G et al.. BioMart - biological queries made easy. BMC Genomics. 2009; 10(1):22. BioMed Central Full Text
  • [22]Aranguren ME. SADI Docker image. http://hub.docker.com/r/mikeleganaaranguren/sadi/. Online; Accessed 5-February-2015.
  • [23]Aranguren ME. SADI-Docker Galaxy tools. https://toolshed.g2.bx.psu.edu/view/mikel-egana-aranguren/sadi_docker/54c48f9ca32b. Online; Accessed 5-February-2015.
  • [24]Aranguren ME. SADI-Docker use case workflow. http://toolshed.g2.bx.psu.edu/view/mikel-egana-aranguren/sadi_docker_workflow/22be3a551998. Online; Accessed 5-February-2015.
  • [25]Aranguren ME. SADI-Docker for Galaxy. http://github.com/mikel-egana-aranguren/SADI-Docker-Galaxy. Online; Accessed 5-February-2015.
  • [26]Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE et al.. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. PLoS One. 2013; 8(11):80278.
  • [27]Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013; 9(10):1003285.
  • [28]Boettiger C. An introduction to Docker for reproducible research, with examples from the R environment. ACM SIGOPS Operating Systems Review - Special Issue on Repeatability and Sharing of Experimental Artifacts. 2015; 49(1):71-79.
  • [29]Giga Science journal. Galaxy Series: Data Intensive and Reproducible Research. http://www.gigasciencejournal.com/series/Galaxy. Online; Accessed 5-February-2015.
  • [30]Aranguren ME. UniProt IDs for SADI-Docker use case workflow. http://github.com/mikel-egana-aranguren/SADI-Docker-Galaxy/blob/master/workflow/UniProt_IDs.txt. Online; Accessed 5-February-2015.
  • [31]Aranguren ME, Wilkinson MD. Supporting data for "Enhanced reproducibility of SADI Web service workflows with Galaxy and Docker". GigaScience Database. 2015. doi:http://dx.doi.org/10.5524/100176.
  文献评价指标  
  下载次数:167次 浏览次数:70次