期刊论文详细信息
Biodiversity Information Science and Standards
OpenBiodiv: Linking Type Materials, Institutions, Locations and Taxonomic Names Extracted From Scholarly Literature
article
Mariya Dimitrova1  Viktor Senderov1  Teodor Georgiev1  Georgi Zhelezov1  Lyubomir Penev3 
[1] Pensoft Publishers;Bulgarian Academy of Sciences;Pensoft Publishers & Bulgarian Academy of Sciences
关键词: Biodiversity Knowledge Graph;    Semantic Technologies;    Use Case;    Ontology;    Information Extraction;   
DOI  :  10.3897/biss.3.35089
来源: Pensoft
PDF
【 摘 要 】

OpenBiodiv is a knowledge management system containing biodiversity knowledge extracted from scholarly literature: both recently published articles in Pensoft's journals and legacy (taxon treatments extracted by Plazi) (Senderov et al. 2017). OpenBiodiv advances our understanding of the use of scientific names, collection codes and institutions within published literature by using semantic technologies, such as the conversion of XML-encoded text to RDF triples, linked via the OpenBiodiv-O onthology (Senderov et al. 2018). In this poster, we show how OpenBiodiv, currently containing more than 729 million statements, can be used to address a specific use case: finding institutions storing type material specimens of the genus Prosopistoma from various literature sources (Fig. 1). This use case is important for various groups of users: institutions, taxonomists, and curators. Answering this complex question is made possible through the application of semantic technologies within OpenBiodiv. Data extraction from taxonomic articles and treatments is enabled the utilisation of common schemas and standards into the extraction process, whereas the conversion of XML-encoded scholarly literature into Resоurce Description Framework (RDF) is facilitated by OpenBiodiv-O. The code base for information extraction and data transformation is wrapped in the R packages rdf4r and ropenbio.The ontology allows to model the structure of research articles and treatments, as well as their corresponding metadata. Thus, OpenBiodiv-O is used to represent not only the sections of treatments but also the various entities within them, for instance geographic coordinates and institution codes within the “Type materials” section of a treatment. Institution codes marked up within articles using the Darwin Core standard (Wieczorek et al. 2012) are mapped to GRBio's institution records. Institutions which are not present in GRBio can often be extracted from the “Abbreviations” section of a given article, thus utilising the power of semantic publishing workflows to discover information hidden within scholarly literature (Penev et al. 2011, Agosti and Egloff 2009). Institutional codes (abbreviations) are then mapped to the narrative section, containing the type materials information. The extraction of coordinates in the taxonomic treatment section allows to establish the location of the collection event through reverse geocoding and enables the selection of treatments linked to a specific geographic region. Modelling of the “Nomenclature” section within OpenBiodiv-O helps to link taxonomic names, mapped to GBIF’s taxonomic backbone, to their type materials, thus facilitating the discovery of materials corresponding to species from a certain higher-rank taxon.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307130002155ZK.pdf 133KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:2次