期刊论文详细信息
Molecules
Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
Mariusz Butkiewicz1  Edward W. Lowe1  Ralf Mueller1  Jeffrey L. Mendenhall1  Pedro L. Teixeira1  C. David Weaver1 
[1] Department of Chemistry, Pharmacology, and Biomedical Informatics, Center for Structural Biology, Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37232, USA
关键词: virtual screening;    machine learning;    quantitative structure-activity relations (QSAR);    high-throughput screening (HTS);    cheminformatics;    PubChem;    BCL;   
DOI  :  10.3390/molecules18010735
来源: mdpi
PDF
【 摘 要 】

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. Thesedata sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.

【 授权许可】

CC BY   
This is an open access article distributed under the Creative Commons Attribution License (CC BY) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202003190039566ZK.pdf 267KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:8次