期刊论文详细信息
Biodiversity Information Science and Standards
A Workflow for Data Extraction from Digitized Herbarium Specimens
article
Sohaib Younis1  Marco Schmidt1  Bernhard Seeger2  Thomas Hickler1  Claus Weiland1 
[1] Senckenberg Biodiversity and Climate Research Centre;Department of Mathematics and Computer Science, Philipps University;Palmengarten;Goethe University
关键词: Trait Recognition;    Convolutional Neural Networks;    Lifelong Learning;    Herbarium specimens;    Trait Semantics;    Digitized Natural History Collections;    Image Processing;    Image Captioning;    Object Detection;    Object Annotation;   
DOI  :  10.3897/biss.3.35190
来源: Pensoft
PDF
【 摘 要 】

Based on own work on species and trait recognition and complementary studies from other working groups, we present a workflow for data extraction from digitized herbarium specimens using convolutional neural networks. Digitized herbarium sheets contain:preserved plant material as well as additional objects:the label containing information on the collection event,annotations such as revision labels, or notes on material extraction,identifiers such as barcodes or numbers,envelopes for loose plant material andoften scale bars and color charts used in the digitization process.In order to treat these objects appropriately, segmentation techniques (Triki et al. 2018) will be applied to localize and identify the different kinds of objects for specific treatments. Detecting presence of plant organs such as leaves, flowers or fruits is already a first step in data extraction potentially useful for phenological studies. Plant organs will be subject to routines for quantitative (Gaikwad et al. 2018) and qualitative (Younis et al. 2018) trait recognition routines. Text-based objects can be treated as described by Kirchhoff et al. 2018, using OCR techniques and considering the many collection-specific terms and abbreviations as described in Schröder 2019. Additionally, species recognition (Younis et al. 2018) will be applied in order to help further identification of incompletely identified collection items or to detect possible misidentifications. All steps described above need sufficient training data including labelling that may be obtained from collection metadata and trait databases.In order to deal with new incoming digitized collections, unseen data or categories, we propose implementation of a new Deep Learning approach, so-called Lifelong Learning: Past knowledge of the network is dynamically saved in latent space using autoencoder and generatively replayed while the network is trained on new tasks which enables it to solve complex image processing tasks without forgetting former knowledge while incrementally learning new classes and knowledge.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307130001932ZK.pdf 66KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:1次