期刊论文详细信息
Journal of Cheminformatics
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
Jian Wu1  Zhenxing Wu2  Chao Shen2  Zhe Wang2  Tingjun Hou3  Dejun Jiang4  Guangyong Chen5  Chang-Yu Hsieh6  Ben Liao6  Dongsheng Cao7 
[1]College of Computer Science and Technology, Zhejiang University, Hangzhou, China
[2]Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, 310058, Hangzhou, Zhejiang, China
[3]Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, 310058, Hangzhou, Zhejiang, China
[4]State Key Lab of CAD & CG, Zhejiang University, 310058, Hangzhou, Zhejiang, China
[5]Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, 310058, Hangzhou, Zhejiang, China
[6]State Key Lab of CAD & CG, Zhejiang University, 310058, Hangzhou, Zhejiang, China
[7]College of Computer Science and Technology, Zhejiang University, Hangzhou, China
[8]Shenzhen Institutes of Advanced Technology, 518055, Shenzhen, Guangdong, China
[9]Tencent Quantum Laboratory Tencent, 518057, Shenzhen, Guangdong, China
[10]Xiangya School of Pharmaceutical Sciences, Central South University, 410004, Changsha, Hunan, China
关键词: Graph neural networks;    Extreme gradient boosting;    Ensemble learning;    Deep learning;    ADME/T prediction;   
DOI  :  10.1186/s13321-020-00479-8
来源: Springer
PDF
【 摘 要 】
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.
【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202106297543878ZK.pdf 2682KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:7次