GigaScience | |
Enhanced reproducibility of SADI web service workflows with Galaxy and Docker | |
Mark D. Wilkinson1  Mikel Egaña Aranguren2  | |
[1] Biological Informatics, Centre for Plant Biotechnology and Genomics (CBGP), Technical University of Madrid (UPM), Campus of Montegancedo, Pozuelo de Alarcón 28223, Spain;Eurohelp Consulting, Maximo Aguirre 18, Bilbo, 48011, Spain | |
关键词: Reproducibility; Docker; Galaxy; Workflow; Web service; SADI; RDF; Semantic Web; | |
Others : 1234665 DOI : 10.1186/s13742-015-0092-3 |
|
received in 2015-02-06, accepted in 2015-10-27, 发布年份 2015 | |
【 摘 要 】
Background
Semantic Web technologies have been widely applied in the life sciences, for example by data providers such as OpenLifeData and through web services frameworks such as SADI. The recently reported OpenLifeData2SADI project offers access to the vast OpenLifeData data store through SADI services.
Findings
This article describes how to merge data retrieved from OpenLifeData2SADI with other SADI services using the Galaxy bioinformatics analysis platform, thus making this semantic data more amenable to complex analyses. This is demonstrated using a working example, which is made distributable and reproducible through a Docker image that includes SADI tools, along with the data and workflows that constitute the demonstration.
Conclusions
The combination of Galaxy and Docker offers a solution for faithfully reproducing and sharing complex data retrieval and analysis workflows based on the SADI Semantic web service design patterns.
【 授权许可】
2015 Aranguren and Wilkinson.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151204045602203.pdf | 1636KB | download | |
Fig. 7. | 42KB | Image | download |
Fig. 6. | 57KB | Image | download |
Fig. 5. | 7KB | Image | download |
Fig. 4. | 54KB | Image | download |
Fig. 3. | 10KB | Image | download |
Fig. 2. | 65KB | Image | download |
Fig. 1. | 6KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
【 参考文献 】
- [1]W, 3C. Semantic Web. http://www.w3.org/standards/semanticweb/. Online; Accessed 5-February-2015.
- [2]Good BM, Wilkinson MD. The Life Sciences Semantic Web is Full of Creeps! Brief Bioinform. 2006; 7(3):275-86.
- [3]W, 3C. RDF current status. http://www.w3.org/standards/techs/rdf. Online; Accessed 5-February-2015.
- [4]W, 3C. SPARQL current status. http://www.w3.org/standards/techs/sparql. Online; Accessed 5-February-2015.
- [5]W, 3C. OWL Web Ontology Language current status. http://www.w3.org/standards/techs/owl. Online; Accessed 5-February-2015.
- [6]Internet Engineering Task Force (IETF). Uniform Resource Identifier (URI): Generic Syntax. http://tools.ietf.org/html/rfc3986. Online; Accessed 5-February-2015.
- [7]Tim Berners-Lee. Linked Data. http://www.w3.org/DesignIssues/LinkedData.html. Online; Accessed 5-February-2015.
- [8]González AR, Callahan A, Toledo JC, García A, Aranguren ME, Dumontier M et al.. Automatically exposing OpenLifeData via SADI semantic Web Services. J Biomed Semant. 2014; 5(1):46. BioMed Central Full Text
- [9]Aranguren ME, Breis JTF, Dumontier M. Special issue on Linked Data for Health Care and the Life Sciences. Semant Web J. 2014; 5(2):99-100.
- [10]Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek B et al.. Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinformatics. 2009; 10(1):136. BioMed Central Full Text
- [11]Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L et al.. The EBI RDF platform: linked open data for the life sciences. Bioinformatics. 2014; 30(9):1338-1339.
- [12]Open Life Data. Open Life Data. http://openlifedata.org/. Online; Accessed 5-February-2015.
- [13]Cyganiak R, Jentzsch A. The Linking Open Data cloud diagram. http://lod-cloud.net/. Online; Accessed 5-February-2015.
- [14]Wilkinson M, Vandervalk B, McCarthy L. The Semantic Automated Discovery and Integration (SADI) web service Design-Pattern, API and Reference Implementation. J Biomed Semant. 2011; 2(1):8. BioMed Central Full Text
- [15]Goecks J, Nekrutenko A, Taylor J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):86. BioMed Central Full Text
- [16]Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. J Biomed Informatics. 2008; 41(5):706-16.
- [17]Docker Inc. Docker - An open platform for distributed applications for developers and sysadmins. http://www.docker.com/. Online; Accessed 5-February-2015.
- [18]Docker Inc.Docker Hub. http://hub.docker.com/. Online; Accessed 5-February-2015.
- [19]Aranguren ME, González AR, Wilkinson MD. Executing SADI services in Galaxy. J Biomed Semant. 2014; 5(1):42. BioMed Central Full Text
- [20]Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000; 28(1):27-30.
- [21]Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G et al.. BioMart - biological queries made easy. BMC Genomics. 2009; 10(1):22. BioMed Central Full Text
- [22]Aranguren ME. SADI Docker image. http://hub.docker.com/r/mikeleganaaranguren/sadi/. Online; Accessed 5-February-2015.
- [23]Aranguren ME. SADI-Docker Galaxy tools. https://toolshed.g2.bx.psu.edu/view/mikel-egana-aranguren/sadi_docker/54c48f9ca32b. Online; Accessed 5-February-2015.
- [24]Aranguren ME. SADI-Docker use case workflow. http://toolshed.g2.bx.psu.edu/view/mikel-egana-aranguren/sadi_docker_workflow/22be3a551998. Online; Accessed 5-February-2015.
- [25]Aranguren ME. SADI-Docker for Galaxy. http://github.com/mikel-egana-aranguren/SADI-Docker-Galaxy. Online; Accessed 5-February-2015.
- [26]Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE et al.. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. PLoS One. 2013; 8(11):80278.
- [27]Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013; 9(10):1003285.
- [28]Boettiger C. An introduction to Docker for reproducible research, with examples from the R environment. ACM SIGOPS Operating Systems Review - Special Issue on Repeatability and Sharing of Experimental Artifacts. 2015; 49(1):71-79.
- [29]Giga Science journal. Galaxy Series: Data Intensive and Reproducible Research. http://www.gigasciencejournal.com/series/Galaxy. Online; Accessed 5-February-2015.
- [30]Aranguren ME. UniProt IDs for SADI-Docker use case workflow. http://github.com/mikel-egana-aranguren/SADI-Docker-Galaxy/blob/master/workflow/UniProt_IDs.txt. Online; Accessed 5-February-2015.
- [31]Aranguren ME, Wilkinson MD. Supporting data for "Enhanced reproducibility of SADI Web service workflows with Galaxy and Docker". GigaScience Database. 2015. doi:http://dx.doi.org/10.5524/100176.