期刊论文详细信息
BMC Bioinformatics
The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web
Robert Guralnick2  Nico Cellinese4  Lukasz Ziemba4  Tom Conlin2  John Deck1  Brian J Stucky3 
[1]Berkeley Natural History Museums, University of California, Berkeley, California, USA
[2]Museum of Natural History, University of Colorado, Boulder, Colorado, USA
[3]Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA
[4]Florida Museum of Natural History, University of Florida, Gainesville, Florida, USA
关键词: SPARQL;    Semantic web;    RDF;    Ontology;    Linked data;    Darwin core;    Biodiversity informatics;    Biocollections;   
Others  :  1087535
DOI  :  10.1186/1471-2105-15-257
 received in 2014-06-12, accepted in 2014-07-22,  发布年份 2014
PDF
【 摘 要 】

Background

Recent years have brought great progress in efforts to digitize the world’s biodiversity data, but integrating data from many different providers, and across research domains, remains challenging. Semantic Web technologies have been widely recognized by biodiversity scientists for their potential to help solve this problem, yet these technologies have so far seen little use for biodiversity data. Such slow uptake has been due, in part, to the relative complexity of Semantic Web technologies along with a lack of domain-specific software tools to help non-experts publish their data to the Semantic Web.

Results

The BiSciCol Triplifier is new software that greatly simplifies the process of converting biodiversity data in standard, tabular formats, such as Darwin Core-Archives, into Semantic Web-ready Resource Description Framework (RDF) representations. The Triplifier uses a vocabulary based on the popular Darwin Core standard, includes both Web-based and command-line interfaces, and is fully open-source software.

Conclusions

Unlike most other RDF conversion tools, the Triplifier does not require detailed familiarity with core Semantic Web technologies, and it is tailored to a widely popular biodiversity data format and vocabulary standard. As a result, the Triplifier can often fully automate the conversion of biodiversity data to RDF, thereby making the Semantic Web much more accessible to biodiversity scientists who might otherwise have relatively little knowledge of Semantic Web technologies. Easy availability of biodiversity data as RDF will allow researchers to combine data from disparate sources and analyze them with powerful linked data querying tools. However, before software like the Triplifier, and Semantic Web technologies in general, can reach their full potential for biodiversity science, the biodiversity informatics community must address several critical challenges, such as the widespread failure to use robust, globally unique identifiers for biodiversity data.

【 授权许可】

   
2014 Stucky et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117013951730.pdf 318KB PDF download
Figure 2. 39KB Image download
Figure 1. 37KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Moritz C, Patton JL, Conroy CJ, Parra JL, White GC, Beissinger SR: Impact of a century of climate change on small-mammal communities in Yosemite National Park, USA. Science 2008, 322:261-264.
  • [2]Scoble M: Rationale and value of natural history collections digitisation. Biodivers Inform 2010, 7:77-80.
  • [3]Erb LP, Ray C, Guralnick R: On the generality of a climate-mediated shift in the distribution of the American pika (Ochotona princeps). Ecology 2011, 92:1730-1735.
  • [4]Bisby FA: The quiet revolution: biodiversity informatics and the Internet. Science 2000, 289:2309-2312.
  • [5]Godfray HCJ, Clark BR, Kitching IJ, Mayo SJ, Scoble MJ: The Web and the structure of taxonomy. Syst Biol 2007, 56:943-955.
  • [6]Sarkar IN: Biodiversity informatics: organizing and linking information across the spectrum of life. Brief Bioinform 2007, 8:347-357.
  • [7]Page RDM: Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Brief Bioinform 2008, 9:345-354.
  • [8]Guralnick R, Hill A: Biodiversity informatics: automated approaches for documenting global biodiversity patterns and processes. Bioinformatics 2009, 25:421-428.
  • [9]Parr CS, Guralnick R, Cellinese N, Page RDM: Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012, 27:94-103.
  • [10]Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, Robertson T, Vieglais D: Darwin core: an evolving community-developed biodiversity data standard. PLoS One 2012, 7:e29715.
  • [11]Robertson T, Döring M, Guranick R, Bloom D, Braak K, Otegui J, Russell L, Wieczorek J, Desmet P: The GBIF integrated publishing toolkit: facilitating the efficient publishing of biodiversity data on the internet. PLoS One 2014.
  • [12]Berners-Lee T, Hendler J, Lassila O: The semantic web. Sci Am 2001, 284:28-37.
  • [13]Heath T, Bizer C: Linked data: evolving the web into a global data space. Synth Lect Semantic Web Theory Technol 2011, 1:1-136.
  • [14]Madin JS, Bowers S, Schildhauer MP, Jones MB: Advancing ecological research with ontologies. Trends Ecol Evol 2008, 23:159-168.
  • [15]Deans AR, Yoder MJ, Balhoff JP: Time to change how we describe biodiversity. Trends Ecol Evol 2012, 27:78-84.
  • [16]Stevens R: Ontology-based knowledge representation for bioinformatics. Brief Bioinform 2000, 1:398-414.
  • [17]Blake JA, Bult CJ: Beyond the data deluge: data integration and bio-ontologies. J Biomed Inform 2006, 39:314-320.
  • [18]Good BM, Wilkinson MD: The life sciences semantic web is full of creeps! Brief Bioinform 2006, 7:275-286.
  • [19]Antezana E, Kuiper M, Mironov V: Biological knowledge management: the emerging role of the Semantic Web technologies. Brief Bioinform 2009, 10:392-407.
  • [20]Chen H, Yu T, Chen JY: Semantic web meets integrative biology: a survey. Brief Bioinform 2013, 14:109-125.
  • [21]RDF primer http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ webcite
  • [22]RDF Vocabulary Description Language 1.0: RDF Schema http://www.w3.org/TR/rdf-schema/ webcite
  • [23]OWL 2 Web Ontology Language primer (second edition) http://www.w3.org/TR/2012/REC-owl2-primer-20121211/ webcite
  • [24]Wooley JC, Field D, Glöckner F-O: Extending standards for genomics and metagenomics data: a research coordination network for the genomic standards consortium (RCN4GSC). Stand Genomic Sci 2009, 1:85-90.
  • [25]Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, Blum S, Bowers S, Buttigieg PL, Davies N, Endresen D, Gandolfo MA, Hanner R, Janning A, Krishtalka L, Matsunaga A, Midford P, Morrison N, Tuama ÉÓ, Schildhauer M, Smith B, Stucky BJ, Thomer A, Wieczorek J, Whitacre J, Wooley J: Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies. PLoS One 2014, 9:e89606.
  • [26]Robertson T, Döring M, Wieczorek J, De Giovanni R, Vieglais D: Darwin Core text guide. http://rs.tdwg.org/dwc/terms/guides/text/ webcite
  • [27]Bizer C, Seaborne A: D2RQ – treating non-RDF databases as virtual RDF graphs. http://iswc2004.semanticweb.org/posters/PID-SMCVRKBT-1089637165.pdf webcite
  • [28]McBride B: Jena: a Semantic Web toolkit. IEEE Internet Comput 2002, 6:55-59.
  • [29]Beckett D: RDF 1.1 N-Triples: A line-based syntax for an RDF graph. http://www.w3.org/TR/n-triples/ webcite
  • [30]Beckett D, Berners-Lee T: Turtle - Terse RDF Triple Language. http://www.w3.org/TeamSubmission/turtle/ webcite
  • [31]Baskauf SJ, Webb CO: Rationale for a Semantic Web implementation of Darwin Core. http://code.google.com/p/darwin-sw/wiki/Rationale webcite
  • [32]Weibel S: The Dublin core: a simple content description model for electronic resources. Bull Am Soc Inf Sci Technol 1997, 24:9-11.
  • [33]TaxonConcept: species concepts for the Semantic Web http://www.taxonconcept.org/ontologies/ webcite
  • [34]Webb C, Baskauf S: D-SW: Darwin Core data for the Semantic Web. http://www.tdwg.org/fileadmin/2011conference/slides/Webb_DarwinSW.pdf webcite
  • [35]Allemang D, Hendler JA: Semantic Web for the Working Ontologist: Modeling in RDF, RDFS and OWL. Amsterdam; Boston: Morgan Kaufmann Publishers/Elsevier; 2008.
  • [36]Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C: Relations in biomedical ontologies. Genome Biol 2005, 6:R46. BioMed Central Full Text
  • [37]DarwinCore Issue 167: MaterialSample https://code.google.com/p/darwincore/issues/detail?id=167 webcite
  • [38]Martin S, Hohman MM, Liefeld T: The impact of Life Science Identifier on informatics data. Drug Discov Today 2005, 10:1566-1572.
  • [39]Hilse H-W, Kothe J, Consortium of European Research Libraries, European Commission on Preservation and Access: Implementing Persistent Identifiers: Overview of Concepts, Guidelines and Recommendations. London; Amsterdam: Consortium of European Research Libraries; European Commission on Preservation and Access; 2006.
  • [40]Biocode Commons Identifiers (BCIDs) http://bcid.googlecode.com/ webcite
  • [41]Chavan VS, Ingwersen P: Towards a data publishing framework for primary biodiversity data: challenges and potentials for the biodiversity informatics community. BMC Bioinformatics 2009, 10(Suppl 14):S2. BioMed Central Full Text
  • [42]Triplifier Web Application http://www.biscicol.org/triplifier/ webcite
  • [43]The BSD 3-Clause License http://opensource.org/licenses/BSD-3-Clause webcite
  • [44]BiSciCol Triplifier Project Site http://triplifier.googlecode.com/ webcite
  • [45]The BiSciCol Blog http://biscicol.blogspot.com/ webcite
  • [46]Ellson J, Gansner ER, Koutsofios E, North SC, Woodhull G: Graphviz and Dynagraph — static and dynamic graph drawing tools. In Graph Draw Softw. Edited by Jünger M, Mutzel P. Berlin, Heidelberg: Springer Berlin Heidelberg; 2004:127-148. [Farin G, Hege H-C, Hoffman D, Johnson CR, Polthier K (Series editors)]
  • [47]Han L, Finin T, Parr C, Sachs J, Joshi A: RDF123: From Spreadsheets to RDF. In Semantic Web - ISWC 2008. Volume 5318. Edited by Sheth A, Staab S, Dean M, Paolucci M, Maynard D, Finin T, Thirunarayan K. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008:451-466.
  • [48]Langegger A, Wöß W: XLWrap – Querying and Integrating Arbitrary Spreadsheets with SPARQL. In Semantic Web - ISWC 2009. Volume 5823. Edited by Bernstein A, Karger DR, Heath T, Feigenbaum L, Maynard D, Motta E, Thirunarayan K. Berlin, Heidelberg: Springer Berlin Heidelberg; 2009:359-374. [Hutchison D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Sudan M, Terzopoulos D, Tygar D, Vardi MY, Weikum G (Series editors)]
  • [49]Lebo T, Erickson JS, Ding L, Graves A, Williams GT, DiFranzo D, Li X, Michaelis J, Zheng JG, Flores J, Shangguan Z, McGuinness DL, Hendler J: Producing and Using Linked Open Government Data in the TWC LOGD Portal. In Link Gov Data. Edited by Wood D. New York, NY: Springer New York; 2011:51-72.
  • [50]SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/ webcite
  • [51]Asiaee AH, Doshi P, Minning T, Sahoo S, Parikh P, Sheth A, Tarleton RL: From questions to effective answers: on the utility of knowledge-driven querying systems for life sciences data. In Data Integr Life Sci. Volume 7970. Edited by Baker CJO, Butler G, Jurisica I. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013:38-45. [Hutchison D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Sudan M, Terzopoulos D, Tygar D, Vardi MY, Weikum G (Series editors)]
  文献评价指标  
  下载次数:40次 浏览次数:55次