期刊论文详细信息
PeerJ
Automatic single- and multi-label enzymatic function prediction by machine learning
article
Shervine Amidi1  Afshine Amidi1  Dimitrios Vlachakis2  Nikos Paragios1  Evangelia I. Zacharaki1 
[1] Department of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris ,(CentraleSupélec), Châtenay-Malabry;MDAKM Group, Department of Computer Engineering and Informatics, University of Patras;Equipe GALEN, INRIA Saclay
关键词: Enzyme classification;    Single-label;    Multi-label;    Structural information;    Amino acid sequence;    Smith-Waterman algorithm;   
DOI  :  10.7717/peerj.3095
学科分类:社会科学、人文和艺术(综合)
来源: Inra
PDF
【 摘 要 】

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307100014179ZK.pdf 1549KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:3次