期刊论文详细信息
Journal of Cheminformatics
DECIMER 1.0: deep learning for chemical image recognition using transformers
Achim Zielesny1  Kohulan Rajan2  Christoph Steinbeck2 
[1] Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany;Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessingstr. 8, 07743, Jena, Germany;
关键词: Chemical data extraction;    Deep learning;    Neural networks;    Optical chemical structure recognition;   
DOI  :  10.1186/s13321-021-00538-8
来源: Springer
PDF
【 摘 要 】

The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202109174667831ZK.pdf 2426KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:5次