期刊论文详细信息
Journal of Cheminformatics
Assessing the calibration in toxicological in vitro models with conformal prediction
Fredrik Svensson1  Ola Spjuth2  Staffan Arvidsson McShane2  Ulf Norinder3  Niharika Gauraha4  Andrea Morger5  Andrea Volkamer5 
[1] Alzheimer’s Research UK UCL Drug Discovery Institute, WC1E 6BT, London, UK;Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden;Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden;Dept. Computer and Systems Sciences, Stockholm University, Box 7003, 164 07, Kista, Sweden;MTM Research Centre, School of Science and Technology, Örebro University, 70 182, Örebro, Sweden;Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 751 24, Uppsala, Sweden;Division of Computational Science and Technology, KTH, 100 44, Stockholm, Sweden;In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany;
关键词: Toxicity prediction;    Conformal prediction;    Data drifts;    Applicability domain;    Calibration plots;    Tox21 datasets;   
DOI  :  10.1186/s13321-021-00511-5
来源: Springer
PDF
【 摘 要 】

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107038523538ZK.pdf 2302KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:8次