期刊论文详细信息
Biodiversity Information Science and Standards
Cloud AI: A comparison of specimen image data extraction processes
article
Ben Scott1 
[1] Natural History Museum
关键词: machine learning;    cloud computing;    digitisation;    natural history collections;    specimen data;   
DOI  :  10.3897/biss.6.90951
来源: Pensoft
PDF
【 摘 要 】

The Natural History Museum (NHM) of London has embarked on an ambitious programme to digitise the 80 million specimens in its collection, releasing them through the NHM data portal and the global biodiversity research community. As part of the digitisation process, data is transcribed from specimen labels to capture the vital taxonomic and collection event data. Accurate human transcription is slow and the NHM, like many institutions, has been exploring machine learning (ML) for automated specimen analysis and label data capture. This process requires many different models, chained in series: semantic segmentation to identify specimen and label regions of interest; optical character recognition to identify text on labels; natural language processing to extract entities from the text.As part of SYNTHESYS+, the NHM has been building the Specimen Data Refinery (SDR) (Smith et al. 2019) - a workflow engine for chaining ML models, each performing one atomic task in the data extraction process. The SDR is now in public beta, and we present evaluation metrics from our initial testing. Alongside the SDR project, the NHM has been exploring cloud-based artificial intelligence tools for specimen digitisation, using Google and Amazon technologies. We present an analysis of these different approaches, comparing the results from third-party AI services with models developed specifically for the biodiversity and natural history collection domains. With large corporates providing comparatively low-cost access to AI compute resources and models transferrable to many specimen image digitisation tasks, is developing bespoke solutions still required.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307130001639ZK.pdf 61KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:0次