期刊论文详细信息
Biodiversity Information Science and Standards
Trait Data Integration from the Perspective of a Data Aggregator
article
Jennifer Hammock1  Katja S Schulz1 
[1] Smithsonian Institution
关键词: traits;     data integration;     graph data;    identifiers;   
DOI  :  10.3897/biss.3.38411
来源: Pensoft
PDF
【 摘 要 】

The Encyclopedia of Life currently hosts ~8M attribute records for ~400k taxa (March 2019, not including geographic categories, Fig. 1). Our aggregation priorities include Essential Biodiversity Variables (Kissling et al. 2018) and other global scale research data priorities. Our primary strategy remains partnership with specialist open data aggregators; we are also developing tools for the deployment of evolutionarily conserved attribute values that scale quickly for global taxonomic coverage, for instance: tissue mineralization type (aragonite, calcite, silica...); trophic guild in certain clades; sensory modalities.To support the aggregation and integration of trait information, data sets should be well structured, properly annotated and free of licensing or contractual restrictions so that they are ‘findable, accessible, interoperable, and reusable’ for both humans and machines (FAIR principles; Wilkinson et al. 2016). To this end, we are improving the documentation of protocols for the transformation, curation, and analysis of EOL data, and associated scripts and software are made available to ensure reproducibility. Proper acknowledgement of contributors and tracking of credit through derived data products promote both open data sharing and the use of aggregated resources. By exposing unique identifiers for data products, people, and institutions, data providers and aggregators can stimulate the development of automated solutions for the creation of contribution metrics. Since different aspects of provenance will be significant depending on the intended data use, better standardization of contributor roles (e.g., author, compiler, publisher, funder) is needed, as well as more detailed attribution guidance for data users.Global scale biodiversity data resources should resolve into a graph, linking taxa, specimens, occurrences, attributes, localities, and ecological interactions, as well as human agents, publications and institutions. Two key data categories for ensuring rich connectivity in the graph will be taxonomic and trait data. This graph can be supported by existing data hubs, if they share identifiers and/or create mappings between them, using standards and sharing practices developed by the biodiversity data community. Versioned archives of the combined graph could be published at intervals to appropriate open data repositories, and open source tools and training provided for researchers to access the combined graph of biodiversity knowledge from all sources. To achieve this, good communication among data hubs will be needed. We will need to share information about preferred vocabularies and identifier management practices, and collaborate on identifier mappings.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307130001877ZK.pdf 91KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:0次