期刊论文

【摘要】

The Smithsonian National Museum of Natural History (NMNH) Department of Paleobiology recently completed the first segment of a mass digitization project in support of the Eastern Pacific Invertebrate Communities of the Cenozoic (EPICC) thematic collections network. In collaboration with the Smithsonian Institution Digitization Project Office (DPO), the team imaged and transcribed labels from a portion of the Cenozoic Mollusca Collection. Once the labels were transcribed further processing was required to clean and enhance that specimen data. We sought to ensure high quality data for this project through:the development of clear guidelines for documentation and treatment of specific data points;updating records to match current taxonomic, lithostratigraphic, and chronostratigraphic information; andcreate iterative workflows to maintain extensibility and to capture uncertainty in the data.A significant challenge for any large collections digitization project is transcribing and cleaning analog information from specimen labels. Often these labels are unstructured with varying levels of data quality and quantity, making interpretation of the data difficult. These problems are compounded for a large scale project combining specimens from multiple collectors or research projects. During this digitization project, we developed methods for accounting for possibly unverified, poorly documented, or sparse analog data; for selecting tools and procedures to efficiently transform this data into standardized vocabularies and structures while ensuring data quality; and for maintaining transparency by clearly documenting the decisions and interpretations made by catalogers. To improve the efficiency of the process, we also used technologies such as Python scripting and OpenRefine to help clean and standardize the data. These steps enabled us to face these challenges of translating analog collections data of over a hundred years old into modern standards for biodiversity information.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO202307130002315ZK.pdf	40KB	PDF	download

Biodiversity Information Science and Standards
Digitizing EPICC Data: Trials and Tribulations in Translating 100 Year Old Data
article
Holly Little¹ Anna K Leary¹ Alexandra L Cano¹ Adam Mansur¹
[1] Smithsonian National Museum of Natural History
关键词: Digitization; Paleontology; Data Standards; Transcription;
DOI : 10.3897/biss.2.26222
来源: Pensoft
PDF


	文献评价指标
	下载次数：6次	浏览次数：5次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】