Journal of Biomedical Semantics | |
Implementation of linked data in the life sciences at BioHackathon 2011 | |
Soichi Ogishima7  Toshiaki Katayama3  Shinobu Okamoto3  Shuichi Kawashima3  Mitsuteru Nakao2  Takaaki Mori4  Anna Kokubu4  Takeo Katoda4  Yukie Akune4  Takatomo Fujisawa8  Yasumasa Shigemoto8  Yi-an Chen5  Yoshinobu Igarashi5  Mizuki Morita1  Akira R Kinjo6  Kiyoko F Aoki-Kinoshita4  | |
[1] Center for Knowledge Structuring, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan;Next Generation Systems Core Function Unit, Eisai Product Creation Systems, Eisai Co., Ltd, Tsukuba, Ibaraki, Japan;Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, Japan;Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan;National Institute of Biomedical Innovation, 7-6-8 Asagi Saito, Ibaraki-City, Osaka 567-0085, Japan;Laboratory of Protein Informatics, Laboratory of Protein Databases, and Protein Data Bank Japan, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan;Department of Bioclinical informatics, Tohoku Medical Megabank Organization, Tohoku University, Seiryo-cho 4-1, Aoba-ku, Sendai-shi Miyagi 980-8575, Japan;DNA Data Bank of Japan, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan | |
关键词: Faceted search interface; Alzheimer’s disease; Glycobiology; DDBJ; PDBj; Data integration; Semantic Web; | |
Others : 1133347 DOI : 10.1186/2041-1480-6-3 |
|
received in 2013-10-19, accepted in 2014-11-27, 发布年份 2015 | |
【 摘 要 】
Background
Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the database provider, such interoperability will surely lead to an increase in the number of users.
Results
This manuscript describes the experiences and discussions shared among participants of the week-long BioHackathon 2011 who went through the development of RDF representations of their own data and developed specific RDF and SPARQL use cases. Advice regarding considerations to take when developing RDF representations of their data are provided for bioinformaticians considering making data available and interoperable.
Conclusions
Participants of the BioHackathon 2011 were able to produce RDF representations of their data and gain a better understanding of the requirements for producing such data in a period of just five days. We summarize the work accomplished with the hope that it will be useful for researchers involved in developing laboratory databases or data analysis, and those who are considering such technologies as RDF and Linked Data.
【 授权许可】
2014 Aoki-Kinoshita et al.; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150304142625318.pdf | 2217KB | download | |
Figure 4. | 91KB | Image | download |
Figure 3. | 87KB | Image | download |
Figure 2. | 95KB | Image | download |
Figure 1. | 101KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Linked Data http://www.w3.org/DesignIssues/LinkedData.html webcite
- [2]Oren E, Delbru R, Catasta M, Cyganiak R, Stenzhorn H, Tummarello G: Sindice.com: a document-oriented lookup index for open linked data. Int J Metadata Semant Ontol 2008, 3(1):37-52.
- [3]Segaran T, Evans C, Taylor J: Programming the Semantic Web. Sebastopol, CA, USA: O’Reilly Media; 2009.
- [4]LOD Cloud Diagram as of September 2011 http://en.wikipedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png webcite
- [5]Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008, 41:706-716.
- [6]Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey MA, Chute CG, Musen MA: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 2009, 37:W170-W173.
- [7]Jonquet C, Musen MA, Shah NH: Building a biomedical ontology recommender web service. J Biomed Semantics 2010, 1(Suppl 1):S1. BioMed Central Full Text
- [8]Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S, The OBI Consortium: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25:1251-1255.
- [9]Antezana E, Blondé W, Egaña M, Rutherford A, Stevens R, De Baets B, Mironov V, Kuiper M: BioGateway: a semantic systems biology tool for the life sciences. BMC Bioinformatics 2009, 10(Suppl 10):S11. BioMed Central Full Text
- [10]Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33:D154-D159.
- [11]Kasprzyk A: BioMart: driving a paradigm change in biological data management. Database (Oxford) 2011, 2011:bar049.
- [12]Guberman JM, Ai J, Arnaiz O, Baran J, Blake A, Baldock R, Chelala C, Croft D, Cros A, Cutts RJ, Di Génova A, Forbes S, Fujisawa T, Gadaleta E, Goodstein DM, Gundem G, Haggarty B, Haider S, Hall M, Harris T, Haw R, Hu S, Hubbard S, Hsu J, Iyer V, Jones P, Katayama T, Kinsella R, Kong L, Lawson D, et al.: BioMart Central Portal: an open database network for the biological community. Database (Oxford) 2011, 2011:bar041.
- [13]Zhang J, Haider S, Baran J, Cros A, Guberman JM, Hsu J, Liang Y, Yao L, Kasprzyk A: BioMart: a data federation framework for large collaborative projects. Database 2011, 2011:bar038.
- [14]Westbrook JD, Bourne PE: STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics 2000, 16:159-168.
- [15]Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM: PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 2005, 21:988-992.
- [16]Akil H, Martone ME, Van Essen DC: Challenges and opportunities in mining neuroscience data. Science 2011, 331:708-712.
- [17]Cheung K, Marshall MS: HCLSIG BioRDF Subgroup. Query Federation. Use case 2 - microarray. http://www.w3.org/wiki/HCLSIG_BioRDF_Subgroup/QueryFederation2 webcite
- [18]Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, Gaulton A, Gehant S, Laibe C, Redaschi N, Wimalaratne SM, Martin M, Le Novère N, Parkinson H, Birney E, Jenkinson AM: The EBI RDF platform: linked open data for the life sciences. Bioinformatics 2014, 30:1338-1339.
- [19]Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick’s Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 2009, 37:D793-D796.
- [20]Akune Y, Hosoda M, Kaiya S, Shinmachi D, Aoki-Kinoshita KF: The RINGS resource for glycome informatics analysis and data mining on the Web. OMICS 2010, 14:475-486.
- [21]Hashimoto K, Goto S, Kawano S, Aoki-Kinoshita K, Ueda N, Hamajima M, Kawasaki T, Kanehisa M: KEGG as a glycome informatics resource. Glycobiology 2006, 16:63R-70R.
- [22]NCBI Gene http://www.ncbi.nlm.nih.gov/gene/ webcite
- [23]Ranzinger R, Herget S, von der Lieth CW, Frank M: GlycomeDB–a unified database for carbohydrate structures. Nucleic Acids Res 2011, 39:D373-D376.
- [24]Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 2009, 37:D233-D238.
- [25]Breton C, Snajdrová L, Jeanneau C, Koca J, Imberty A: Structures and mechanisms of glycosyltransferases. Glycobiology 2006, 16:29R-37R.
- [26]Animal Lectins Database http://www.imperial.ac.uk/animallectins/default.html webcite
- [27]Yasugi E, Seyama Y: Lipid database “LipidBank” and international collaboration. Tanpakushitsu Kakusan Koso 2007, 52:1357-1362.
- [28]Sud M, Fahy E, Cotter D, Brown A, Dennis EA, Glass CK, Merrill AH, Murphy RC, Raetz CR, Russell DW, Subramaniam S: LMSD: LIPID MAPS structure database. Nucleic Acids Res 2007, 35:D527-D532.
- [29]GlycO Ontology http://bioportal.bioontology.org/ontologies/GLYCO webcite
- [30]Laibe C, Le Novère N: MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology. BMC Syst Biol 2007, 1:58. BioMed Central Full Text
- [31]Beckett D: The design and implementation of the redland RDF application framework. In Proceedings of the 10th international conference on World Wide Web. Hong Kong, Hong Kong: ACM; 2001:449-456.
- [32]Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 2010, 38:D161-D166.
- [33]Kinjo AR, Suzuki H, Yamashita R, Ikegawa Y, Kudou T, Igarashi R, Kengaku Y, Cho H, Standley DM, Nakagawa A, Nakamura H: Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 2011.
- [34]Uschold M, Gruninger M: Ontologies: Principles, methods and applications. In Knowledge engineering review. Volume 11. Issue 02. Cambridge, UK: Cambridge University Press; 1996:93-136.
- [35]A Guide to Creating Your First Ontology http://www-ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html webcite
- [36]Ontofinder http://ontofinder.dbcls.jp/ webcite
- [37]Ontology editor [http://en.wikipedia.org/wiki/Ontology_editor webcite]
- [38]Protege http://protege.stanford.edu/ webcite
- [39]WebProtege http://www.semantic-web-journal.net/content/webprot%C3%A9g%C3%A9-collaborative-ontology-editor-and-knowledge-acquisition-tool-web webcite
- [40]Halpin H, Herman I, Hayes PJ: When owl: sameAs isn’t the same: an analysis of identity links on the semantic web. Linked Data on the Web (LDOW) 2010.
- [41]Marshall MS, Boyce R, Deus HF, Zhao J, Willighagen EL, Samwald M, Pichler E, Hajagos J, Prud’hommeaux E, Stephens S: Emerging practices for mapping and linking life sciences data using RDF — A case series. Web Semant Sci Serv Agents World Wide Web 2012, 14:2-13.