期刊论文详细信息
BMC Bioinformatics
Efficient discovery of responses of proteins to compounds using active learning
Joshua D Kangas1  Armaghan W Naik1  Robert F Murphy2 
[1] Lane Center for Computational Biology, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
[2] Freiburg Institute for Advanced Studies and Faculty of Biology, Albert Ludwig University, Freiburg, Germany
关键词: Drug discovery;    Computational biology;    Polypharmacology;    Drug development;    Machine learning;    Active learning;   
Others  :  818574
DOI  :  10.1186/1471-2105-15-143
 received in 2013-11-26, accepted in 2014-05-07,  发布年份 2014
PDF
【 摘 要 】

Background

Drug discovery and development has been aided by high throughput screening methods that detect compound effects on a single target. However, when using focused initial screening, undesirable secondary effects are often detected late in the development process after significant investment has been made. An alternative approach would be to screen against undesired effects early in the process, but the number of possible secondary targets makes this prohibitively expensive.

Results

This paper describes methods for making this global approach practical by constructing predictive models for many target responses to many compounds and using them to guide experimentation. We demonstrate for the first time that by jointly modeling targets and compounds using descriptive features and using active machine learning methods, accurate models can be built by doing only a small fraction of possible experiments. The methods were evaluated by computational experiments using a dataset of 177 assays and 20,000 compounds constructed from the PubChem database.

Conclusions

An average of nearly 60% of all hits in the dataset were found after exploring only 3% of the experimental space which suggests that active learning can be used to enable more complete characterization of compound effects than otherwise affordable. The methods described are also likely to find widespread application outside drug discovery, such as for characterizing the effects of a large number of compounds or inhibitory RNAs on a large number of cell or tissue phenotypes.

【 授权许可】

   
2014 Kangas et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711114415520.pdf 1451KB PDF download
Figure 6. 83KB Image download
Figure 5. 87KB Image download
Figure 4. 72KB Image download
Figure 3. 65KB Image download
Figure 2. 73KB Image download
Figure 1. 132KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Patani GA, LaVoie EJ: Bioisosterism: a rational approach in drug design. Chem Rev 1996, 96(8):3147-3176.
  • [2]Kearsley SK, Sallamack S, Fluder EM, Andose JD, Mosley RT, Sheridan RP: Chemical similarity using physiochemical property descriptors. J Chem Inf Comput Sci 1996, 36(1):118-127.
  • [3]Sheridan RP, Miller MD, Underwood DJ, Kearsley SK: Chemical similarity using geometric atom pair descriptors. J Chem Inf Comput Sci 1996, 36:128-136.
  • [4]Lengauer T, Rarey M: Computational methods for biomolecular docking. Curr Opin Struc Biol 1996, 6:402-406.
  • [5]Huang S-Y, Zou X: Advances and challenges in protein-ligand docking. Intl J Mol Sci 2010, 11:3016-3034.
  • [6]Han L, Wang Y, Bryant SH: Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in PubChem. BMC Bioinf 2008, 9:401. BioMed Central Full Text
  • [7]Li Q, Wang Y, Bryant SH: A novel method for mining highly imbalanced high-throughput screening data in PubChem. Bioinformatics 2009, 25:3310-3316.
  • [8]Merino A, Bronowska AK, Jackson DB, Cahill DJ: Drug profiling: knowing where it hits. Drug Discov Today 2010, 15(17–18):749-756.
  • [9]Murphy RF: An active role for machine learning in drug development. Nat Chem Biol 2011, 7(6):327-330.
  • [10]Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KL, Edwards DD, Shoichet BK, Roth BL: Predicting new molecular targets for known drugs. Nature 2009, 462(7270):175-181.
  • [11]Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguez RM, Huang X-P, Norval S, Sassano MF, Shin AI, Webster LA, Simeons FR, Stojanovski L, Prat A, Seidah NG, Constam DB, Bickerton GR, Read KD, Wetsel WC, Gilbert IH, Roth BL, Hopkins AL: Automated design of ligands to polypharmacological profiles. Nature 2012, 492:215-220.
  • [12]Oprea TI, Nielsen SK, Ursu O, Yang JJ, Taboureau O, Mathias SL, Kouskoumvekaki L, Sklar LA, Bologa CG: Associating drugs, targets and clinical outcomes into an integrated network affords a New platform for computer-aided drug repurposing. Mol Inform 2011, 30(2–3):100-111.
  • [13]Chen YZ, Zhi DG: Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins 2001, 43(2):217-226.
  • [14]Caschera F, Gazzola G, Bedau MA, Bosch Moreno C, Buchanan A, Cawse J, Packard N, Hanczyc MM: Automated discovery of novel drug formulations using predictive iterated high throughput experimentation. PLoS One 2010, 5(1):e8546.
  • [15]Tong S, Koller D: Active Learning for Structure in Bayesian Networks. Seventeenth International Joint Conference on Artificial Intelligence; Seattle, Washington 2001, 863-869.
  • [16]Pournara I, Wernisch L: Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 2004, 20(17):2934-2942.
  • [17]Liu Y: Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci 2004, 44(6):1936-1941.
  • [18]King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG: Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 2004, 427(6971):247-252.
  • [19]Stegle O, Payet L, Mergny J-L, MacKay DJC, Huppert JL: Predicting and understanding the stability of G-quadruplexes. Bioinformatics 2009, 25:i374-i382.
  • [20]Danziger SA, Baronio R, Ho L, Hall L, Salmon K, Hatfield GW, Kaiser P, Lathrop RH: Predicting positive p53 cancer rescue regions using Most Informative Positive (MIP) active learning. PLoS Comput Biol 2009, 5(9):e1000498.
  • [21]Mohamed TP, Carbonell JG, Ganapathiraju MK: Active learning for human protein-protein interaction prediction. BMC Bioinf 2010, 11(Suppl 1):S57. BioMed Central Full Text
  • [22]Warmuth MK, Liao J, Ratsch G, Mathieson M, Putta S, Lemmen C: Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci 2003, 43(2):667-673.
  • [23]Fujiwara Y, Yamashita Y, Osoda T, Asogawa M, Fukushima C, Asao M, Shimadzu H, Nakao K, Shimizu R: Virtual screening system for finding structurally diverse hits by active learning. J Chem Inf Model 2008, 48(4):930-940.
  • [24]Tibshirani R: Regression Shrinkage and Selection via the Lasso. J Roy Stat Soc B Met 1996, 58(1):267-288.
  • [25]Fujii A, Inui K, Tokunaga T, Tanaka H: Selective sampling for example-based word sense disambiguation. Comput Linguist 1998, 24(4):573-597.
  • [26]Naik AW, Kangas JD, Langmead CJ, Murphy RF: Efficient modeling and active learning discovery of biological responses. PLoS One 2013, 8(12):e83996.
  • [27]MacKay DJC: Information-based objective functions for active data. Neural Comput 1992, 4(4):590-604.
  • [28]Settles B, Craven M: An Analysis of Active Learning Strategies for Sequence Labeling Tasks. 2008, 1070-1079. [Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’08)]
  • [29]Bolton EE, Wang Y, Thiessen PA, Bryant SH: PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 2008, 4:217-241.
  • [30]Wilkins MR, Gasteiger E, Bairoch A, Sanchez J-C, Williams KL, Appel RD, Hochstrasser DF: Protein identification and analysis tools in the ExPASy server. Meth Mol Biol 1999, 112:531-552.
  • [31]de Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucl Acids Res 2006, 34(suppl 2):W362-W365.
  • [32]Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner JK, Willighagen EL: The blue obelisk-interoperability in chemical informatics. J Chem Inf Model 2006, 46:991-998.
  • [33]Efron B, Hastie T, Johnstone I, Tibrshriani R: Least angle regression. Ann Stat 2004, 32(9):407-499.
  文献评价指标  
  下载次数:124次 浏览次数:29次