期刊论文详细信息
Frontiers in Artificial Intelligence
Cracking the genetic code with neural networks
Artificial Intelligence
Marc Joiret1  Liesbet Geris2  Francesca Rapino3  Pierre Close3  Marine Leclercq3  Gaspard Lambrechts4  Gilles Louppe4 
[1] Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium;Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium;Skeletal Biology and Engineering Research Center, KU Leuven, Leuven, Belgium;Biomechanics Section, KU Leuven, Heverlee, Belgium;Cancer Signaling, GIGA Stem Cells, Liège University, Liège, Belgium;Department of Electrical Engineering and Computer Science, Artificial Intelligence and Deep Learning, Montefiore Institute, Liège University, Liège, Belgium;
关键词: Artificial Intelligence;    genetic code deciphering;    codon usage;    codon embedding;    deep neural network;    data efficiency;    natural language processing;   
DOI  :  10.3389/frai.2023.1128153
 received in 2022-12-21, accepted in 2023-03-21,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.

【 授权许可】

Unknown   
Copyright © 2023 Joiret, Leclercq, Lambrechts, Rapino, Close, Louppe and Geris.

【 预 览 】
附件列表
Files Size Format View
RO202310105398895ZK.pdf 3551KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:0次