期刊论文详细信息
Journal of Biomedical Semantics
Structuring research methods and data with the research object model: genomics workflows as a case study
Marco Roos6  Carole Goble4  Sean Bechhofer4  Peter A C ‘t Hoen6  Reinout van Schouwen6  Graham Klyne5  Oscar Corcho2  David de Roure5  Julian Garrido3  Lourdes Verdes-Montenegro3  Don Cruickshank5  Mark Thompson6  Eleni Mina6  Stian Soiland-Reyes4  Khalid Belhajjame4  Katherine Wolstencroft1  Jun Zhao5  Harish Dharuri6  Kristina M Hettne6 
[1] Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands;Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain;Instituto de Astrofísica de Andalucía, Granada, Spain;School of Computer Science, University of Manchester, Manchester, UK;Department of Zoology, University of Oxford, Oxford, UK;Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
关键词: Genome wide association study;    Digital libraries;    Scientific workflows;    Semantic web models;   
Others  :  1133537
DOI  :  10.1186/2041-1480-5-41
 received in 2013-05-13, accepted in 2014-07-29,  发布年份 2014
【 摘 要 】

Background

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

Results

We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?”, and “which particular conclusions were drawn from a particular workflow?”.

Conclusions

Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.

Availability

The Research Object is available at http://www.myexperiment.org/packs/428 webcite

The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro webcite

【 授权许可】

   
2014 Hettne et al.; licensee BioMed Central Ltd.

附件列表
Files Size Format View
Figure 10. 79KB Image download
Figure 9. 88KB Image download
Figure 8. 86KB Image download
Figure 7. 51KB Image download
Figure 6. 93KB Image download
Figure 5. 54KB Image download
Figure 4. 157KB Image download
Figure 3. 49KB Image download
Figure 2. 116KB Image download
Figure 1. 79KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

【 参考文献 】
  • [1]Chen H, Yu T, Chen JY: Semantic Web meets Integrative Biology: a survey. Brief Bioinform 2012, 14:109-125.
  • [2]Sneddon TP, Li P, Edmunds SC: GigaDB: announcing the GigaScience database. Gigascience 2012, 1:11. BioMed Central Full Text
  • [3]Ghosh S, Matsuoka Y, Asai Y, Hsin K-Y, Kitano H: Software for systems biology: from tools to integrated platforms. Nat Rev Genet 2011, 12:821-832.
  • [4]Beaulah SA, Correll MA, Munro REJ, Sheldon JG: Addressing informatics challenges in Translational Research with workflow technology. Drug Discov Today 2008, 13:771-777.
  • [5]Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P, Bhagat J, Belhajjame K, Bacall F, Hardisty A, de la Nieva Hidalga A, Balcazar Vargas MP, Sufi S, Goble C: The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 2013, 41(Web Server issue):W557-W561.
  • [6]Goecks J, Nekrutenko A, Taylor J, Galaxy Team T: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11:R86. BioMed Central Full Text
  • [7]Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D: myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 2010, 38(Web Server):W677-W682.
  • [8]Mates P, Santos E, Freire J, Silva CT: CrowdLabs: Social Analysis and Visualization for the Sciences. In Sci Stat Database Manag. Volume 6809. Edited by Hutchison D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Sudan M, Terzopoulos D, Tygar D, Vardi MY, Weikum G, Bayard Cushing J, French J, Bowers S. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011:555-564.
  • [9]Zhao J, Gomez-Perez JM, Belhajjame K, Klyne G, Garcia-Cuesta E, Garrido A, Hettne K, Roos M, De Roure D, Goble C: Why workflows break - Understanding and combating decay in Taverna workflows. 2012 IEEE 8th International Conference on E-Science (e-Science) 2012, 1-9. doi: dx.doi.org/10.1109/eScience.2012.6404482
  • [10]Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D: A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013, 7:882-889.
  • [11]Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen EL, Evelo CT, Blomberg N, Ecker G, Goble C, Mons B: Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today 2012, 17:1188-1198.
  • [12]Wf4Ever Research Object model http://wf4ever.github.io/ro webcite
  • [13]Belhajjame K, Corcho O, Garijo D, Zhao J, Missier P, Newman DR, Palma R, Bechhofer S, Garcia Cuesta E, Gomez-Perez JM, Klyne G, Page K, Roos M, Enrique Ruiz J, Soiland-Reyes S, Verdes-Montenegro L, De Roure D, Goble C: Workflow-Centric Research Objects: A First Class Citizen in the Scholarly Discourse. In Proc 2nd Work Semant Publ. Volume 903. Hersonissos, Crete, Greece: {CEUR} Workshop Proceedings; 2012.
  • [14]Bechhofer S, De Roure D, Gamble M, Goble CA, Buchan I: Research objects: Towards exchange and reuse of digital knowledge. Raleigh: In Futur Web Collab Sci; 2010.
  • [15]Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides D, Owen S, Newman D, Sufi S, Goble C: Why linked data is not enough for scientists. Futur Gener Comput Syst 2013, 29:599-611.
  • [16]De Roure D, Missier P, Manuel J, Hettne K, Klyne G, Goble C: Towards the Preservation of Scientific Workflows. iPress 2011
  • [17]Roos M, Marshall MS, Gibson AP, Schuemie M, Meij E, Katrenko S, van Hage WR, Krommydas K, Adriaans PW: Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC Bioinformatics 2009, 10 Suppl 1(Suppl 10):S9.
  • [18]Livingston KM, Bada M, Hunter LE, Verspoor K: Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013, 4:38. BioMed Central Full Text
  • [19]Ciccarese P, Soiland-Reyes S, Belhajjame K, Gray AJ, Goble C, Clark T: PAV ontology: provenance, authoring and versioning. J Biomed Semantics 2013, 4:37. BioMed Central Full Text
  • [20]Object Exchange and Reuse (ORE) model http://www.openarchives.org/ore/1.0/primer.html webcite
  • [21]Ciccarese P, Ocana M, Garcia Castro LJ, Das S, Clark T: An open annotation ontology for science on web 3.0. J Biomed Semantics 2011, 2(Suppl 2):S4. BioMed Central Full Text
  • [22]Missier P, Belhajjame K, Cheney J: The W3C PROV family of specifications for modelling provenance metadata. In Proc 16th Int Conf Extending Database Technol - EDBT ’13. New York, New York, USA: ACM Press; 2013:773.
  • [23]Zhao J, Klyne G, Gamble M, Goble CA: A Checklist-Based Approach for Quality Assessment of Scientific Information. In Proceedings of the Third Linked Science Workshop co-located at the International Semantic Web Conference. Sydney, Australia; 2013.
  • [24]Minim checklist service https://github.com/wf4ever/ro-manager/blob/master/Minim/Minim-description.md webcite
  • [25]Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novère N, et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 2008, 26:889-896.
  • [26]Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RN, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, Ding CJ, Doney AS, Duren WL, Elliott KS, Erdos MR, Frayling TM, Freathy RM, Gianniny L, Grallert H, Grarup N, et al.: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 2008, 40:638-645.
  • [27]McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008, 9:356-369.
  • [28]Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, Prehn C, Altmaier E, Kastenmüller G, Kato BS, Mewes H-W, Meitinger T, de Angelis MH, Kronenberg F, Soranzo N, Wichmann H-E, Spector TD, Adamski J, Suhre K: A genome-wide perspective of genetic variation in human metabolism. Nat Genet 2010, 42:137-141.
  • [29]Gieger C, Geistlinger L, Altmaier E, de Angelis M, Kronenberg F, Meitinger T, Mewes H-W, Wichmann H-E, Weinberger KM, Adamski J, Illig T, Suhre K: Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet 2008, 4:e1000282.
  • [30]Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D, Wägele B, Altmaier E, Deloukas P, Erdmann J, Grundberg E, Hammond CJ, de Angelis MH, Kastenmüller G, Köttgen A, Kronenberg F, Mangino M, Meisinger C, Meitinger T, Mewes HW, Milburn MV, Prehn C, Raffler J, Ried JS, Römisch-Margl W, Samani NJ, Small KS, Wichmann HE, Zhai G, Illig T, CARDIoGRAM, et al.: Human metabolic individuality in biomedical and pharmaceutical research. Nature 2011, 477:54-60.
  • [31]Jelier R, Schuemie MJ, Veldhoven A, Dorssers LCJ, Jenster G, Kors JA: Anni 2.0: a multipurpose text-mining tool for the life sciences. Genome Biol 2008, 9:R96. BioMed Central Full Text
  • [32]Hettne KM, Boorsma A, van Dartel DA, Goeman JJ, de Jong E, Piersma AH, Stierum RH, Kleinjans JC, Kors JA: Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data. BMC Med Genomics 2013, 6:2. BioMed Central Full Text
  • [33]myExperiment alpha http://alpha.myexperiment.org webcite
  • [34]Palma R, Corcho O, Hotubowicz P, Pérez S, Page K, Mazurek C: Digital libraries for the preservation of research methods and associated artifacts. In Proc 1st Int Work Digit Preserv Res Methods Artefacts - DPRMA ’13. New York, New York, USA: ACM Press; 2013:8-15.
  • [35]Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 2012, 40(Database issue):D109-D114.
  • [36]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25:25-29.
  • [37]KEGG REST services http://www.kegg.jp/kegg/rest/keggapi.html webcite
  • [38]Concept Profile Mining Web services https://www.biocatalogue.org/services/3559 webcite
  • [39]Hettne KM, Wolstencroft K, Belhajjame K, Goble CA, Mina E, Dharuri H, De Roure D, Verdes-Montenegro L, Garrido J, Roos M: Best Practices for Workflow Design: How to Prevent Workflow Decay. In Proc 5th Int Work Semant Web Appl Tools Life Sci Paris, Fr Novemb 28-30, 2012, Volume 952. Paris. France: CEUR-WS.org; 2012. [{CEUR} Workshop Proceedings]
  • [40]Sanderson R, Ciccarese P, Van de Sompel H: Designing the W3C open annotation data model. In Proc 5th Annu ACM Web Sci Conf - WebSci ’13. New York, New York, USA: ACM Press; 2013:366-375.
  • [41]wfdesc vocabulary https://github.com/wf4ever/ro/blob/master/wfdesc.owl webcite
  • [42]wfprov ontology http://purl.org/wf4ever/wfprov# webcite
  • [43]RO terms vocabulary http://purl.org/wf4ever/roterms webcite
  • [44]Minim checklist ontology http://purl.org/minim/ webcite
  • [45]Research Object Digital Library Restful API http://www.wf4ever-project.org/wiki/display/docs/RO+API+6 webcite
  • [46]Research Object Digital Library SPARQL endpoint http://sandbox.wf4ever-project.org/portal/sparql?1 webcite
  • [47]Alper P, Belhajjame K, Goble CA, Karagoz P: Enhancing and abstracting scientific workflow provenance for data publishing. In Proc Jt EDBT/ICDT 2013 Work - EDBT ’13. New York, New York, USA: ACM Press; 2013:313.
  • [48]Research Object in myExperiment http://www.myexperiment.org/packs/428 webcite
  • [49]Research Object results http://alpha.myexperiment.org/packs/405/resources/kegg_cp_comparison_results.xls webcite
  • [50]DCMI Usage Board (2012): DCMI Metadata Terms http://dublincore.org/documents/2012/06/14/dcmi-terms/ webcite
  • [51]RO checklist document in RDF https://github.com/wf4ever/ro-catalogue/blob/master/minim/minim-workflow-demo.rdf webcite
  • [52]Spreadsheet-based RO checklist document https://github.com/wf4ever/ro-catalogue/blob/master/minim/minim-workflow-demo.pdf webcite
  • [53]Enhancing reproducibility Nat Methods 2013, 10:367-367. doi:10.1038/nmeth.2471
  • [54]Ince DC, Hatton L, Graham-Cumming J: The case for open computer programs. Nature 2012, 482:485-488.
  • [55]Peng RD: Reproducible research in computational science. Science 2011, 334:1226-1227.
  • [56]SPARQL Protocol and RDF Query Language http://www.w3.org/TR/sparql11-overview/ webcite
  • [57]Cheung K-H, Kashyap V, Luciano JS, Chen H, Wang Y, Stephens S, Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, Clark T: The SWAN biomedical discourse ontology. J Biomed Inform 2008, 41:739-751.
  • [58]Page K, Palma R, Holubowicz P, Klyne G, Soiland-Reyes S, Cruickshank D, Cabero RG, Cuesta EG, De Roure D, Zhao J: From workflows to Research Objects: an architecture for preserving the semantics of science. Proc 2nd Int Work Linked Sci 2012.
  • [59]dLibra http://dlab.psnc.pl/dlibra/ webcite
  • [60]myExperiment release schedule http://wiki.myexperiment.org/index.php/Developer:ReleaseSchedule webcite
  • [61]Genome Space http://www.genomespace.org/ webcite
  • [62]Tiwari A, Sekhar AKT: Workflow based framework for life science informatics. Comput Biol Chem 2007, 31:305-319.
  • [63]Romano P: Automation of in-silico data analysis processes through workflow management systems. Brief Bioinform 2008, 9:57-68.
  • [64]Example Minim checklist definition https://github.com/wf4ever/ro-catalogue/blob/master/v0.1/Y2Demo-test/workflow-experiment-checklist.rdf webcite
  • [65]Collection of example Minim checklist definitions https://github.com/wf4ever/ro-catalogue/tree/master/minim webcite
  • [66]Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, Malone J, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone S-A, Soldatova LN, Stoeckert CJ, Turner JA, Zheng J: Modeling biomedical experimental processes with OBI. J Biomed Semantics 2010, 1(Suppl 1):S7. BioMed Central Full Text
  • [67]Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D, Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone S-A: ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 2010, 26:2354-2356.
  • [68]Sansone S-A, Rocca-Serra P, Brandizi M, Brazma A, Field D, Fostel J, Garrow AG, Gilbert J, Goodsaid F, Hardy N, Jones P, Lister A, Miller M, Morrison N, Rayner T, Sklyar N, Taylor C, Tong W, Warner G, Wiemann S: The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?”. OMICS 2008, 12:143-149.
  • [69]Maguire E, González-Beltrán A, Whetzel PL, Sansone S-A, Rocca-Serra P: OntoMaton: a bioportal powered ontology widget for Google Spreadsheets. Bioinformatics 2013, 29:525-527.
  • [70]Soldatova LN, King RD: An ontology of scientific experiments. J R Soc Interface 2006, 3:795-803.
  • [71]Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P: EDAM: An ontology of bioinformatics operations, types of data and identifiers, topics, and formats. Bioinformatics 2013, 29:1325-1332.
  • [72]Wilkinson MD, Vandervalk B, McCarthy L: The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern. API and Reference Implementation J Biomed Semantics 2011, 2:8.
  • [73]Patrinos GP, Cooper DN, van Mulligen E, Gkantouna V, Tzimas G, Tatum Z, Schultes E, Roos M, Mons B: Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Hum Mutat 2012, 33:1503-1512.
  • [74]Mons B, Van Haagen H, Chichester C, Hoen ’t P-B, Dunnen JT D, Van Ommen G, Mulligen EM V, Singh B, Hooft R, Roos M, Hammond J, Kiesel B, Giardine B, Velterop J, Groth P, Schultes E, Den Dunnen JT: The value of data. Nat Genet 2011, 43:281-283.
  • [75]Nanopublication schema http://nanopub.org/nschema webcite
  • [76]Garcia Castro L, McLaughlin C, Garcia A: Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data. J Biomed Semantics 2013, 4(Suppl 1):S5. BioMed Central Full Text
  • [77]data.elsevier.com http://data.elsevier.com/documentation/index.html webcite
  • [78]Page KR, Fields B, De Roure D, Crawford T, Downie JS: Capturing the workflows of music information retrieval for repeatability and reuse. J Intell Inf Syst 2013, 41:435-459.
  • [79]Garrido J, Soiland-Reyes S, Enrique Ruiz J, Sanchez S: AstroTaverna: Tool for Scientific Workflows in Astronomy. Astrophys Source Code Libr 2013. http://ascl.net/1307.007 webcite
  • [80]Mina E, Thompson M, Zhao J, Hettne K, Schultes E, Roos M: Nanopublications for exposing experimental data in the life-sciences: a Huntington’s Disease case study. In SWAT4LS, volume 1114 of CEUR Workshop Proceedings, CEUR-WS.org. Edinburgh; 2013.
  • [81]Huntington’s Disease study Research Object http://sandbox.wf4ever-project.org/rodl/ROs/data_interpretation-2/ webcite
  • [82]ResearchObject.org http://www.researchobject.org/ webcite
  • [83]Research Object examples http://www.researchobject.org/initiative/ webcite
  文献评价指标  
  下载次数:51次 浏览次数:1次