期刊论文详细信息
Biodiversity Information Science and Standards
Improving the discoverability of biodiversity data using the Global Names Finder
article
Anne E Thessen1  Dmitry Mozzherin3  David Peter Shorthouse4  David J Patterson5 
[1] University of Colorado Anschutz Medical Campus;The Ronin Institute for Independent Scholarship;University of Illinois;Agriculture & Agri-Food Canada;University of Sydney
关键词: taxonomic names;    indexing;    metadata;    named entity recognition;   
DOI  :  10.3897/biss.6.90026
来源: Pensoft
PDF
【 摘 要 】

The majority of biodiversity data is not findable, accessible, integratable, or reusable, partially because of a lack of metadata. Taxonomic names as metadata are useful, but not sufficient because these names may be updated as knowledge progresses. There is a great need for tools and services that can scale up to create and maintain metadata for the vast and varied long tail of dark data. Here we examine the use of GNFinder as a tool for creating and maintaining metadata using mentions of taxa in text from publications corresponding to data sets deposited in Dryad. Most studied taxa were mentioned in the publication using a properly formed scientific name, with a few exceptions for studies that only used vernacular names and only mentioned taxa in the corresponding files. GNFinder had a high F1 Score (0.86) representing a balance between precision (0.91) and recall (0.82). GNFinder had lower performance when a name string was an irregular abbreviation, had unexpected capitalization or punctuation, or contained a qualifier (like aff. or cf.). Approximately 14% of the name strings identified in text published from 1996 to 2012 were outdated and updated to a current, valid name. Automated metadata creation and maintenance at scale using GNFinder can make it easier to find biodiversity publications as demonstrated by the Biodiversity Heritage Library and HathiTrust.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307130001502ZK.pdf 284KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:0次