Methodological Advances for Drug Discovery and ProteinEngineering
drug discovery;protein engineering;protein-ligand docking;free energy calculation;machine learning;statistics;Biomedical Engineering;Chemical Engineering;Pharmacy and Pharmacology;Chemistry;Ecology and Evolutionary Biology;Physics;Statistics and Numeric Data;Engineering;Health Sciences;Science;Bioinformatics
Designing and engineering molecules that have specified properties not only test our understanding of nature but also play an important role in improving both human health and industrial productivity. Two such examples are drug discovery that designs new molecules to treat or even cure diseases and protein engineering that develops useful proteins for medical purposes or catalyzing industrial chemical reactions. However, drug discovery and protein engineering are time-consuming and financially expensive processes because they require multiple rounds of trial-and-error. For instance, developing a new drug costs on average one billion dollars and 10 years of efforts. One effective way to reduce the cost and accelerate the processes is developing computational methods that can rationalize the designing and engineering processes. With both methodological development and increasing amount of computational resource, computational methods for both drug discovery and protein engineering are becoming more and more effective. In this dissertation, I have made new advances on computational methods for drug discovery and protein engineering.Protein-ligand docking and free energy calculation are two widely used computational methods in drug discovery. In the dissertation, I first developed an accelerated version of the protein-ligand docking method, CDOCKER, by introducing two new features — fast Fourier transform based docking and parallel simulated annealing, both of which utilize the parallel computing power of graphical process units. The two new features not only accelerate CDOCKER by at least one order of magnitude but also provide an approach to calculate an upper bound of a scoring function’s docking accuracy which will be useful to optimize the scoring function used in CDOCKER. Then I introduced two new methods for protein-ligand binding free energy calculation: Gibbs sampler lambda-dynamics (GSLD) and Rao-Blackwell estimators (RBE). Compared with the original lambda-dynamics, GSLD is more flexible and easier to implement, and retains the capacity to calculate free energies for multiple ligands simultaneously in a single simulation. Compared with the empirical estimator used in lambda-dynamics, RBE has the advantages that RBE is an unbiased estimator that does not depend on ad hoc cutoff values that are used in the empirical estimators and RBE also has smaller variance than the empirical estimators.Finally, for protein engineering, I investigated how variational auto-encoder models can be useful by inferring information regarding protein stability, evolution, and fitness landscapes using protein sequences. Variational auto-encoder models are probabilistic generative models that embed discrete information in a lower dimensional continuous latent space. Utilizing a protein family;;s multiple sequence alignment as training data, variational auto-encoder models learn a probability distribution of sequences for the protein family. The probability distribution may then by employed to predict protein stability change upon mutation. The embedding of sequences in a low dimensional latent space not only provides a way to visualize a protein family;;s sequence space, but also captures evolutionary relationships between sequences. Together with experimental fitness data for the protein sequences, the embedding enables the visualization and expression of the protein;;s fitness landscape in a low dimensional continuous space. With the amount of protein sequence data keeps increasing rapidly due to advances in sequencing technology, these features of variational auto-encoder models are useful for both studying proteins and guiding protein engineering efforts.
【 预 览 】
附件列表
Files
Size
Format
View
Methodological Advances for Drug Discovery and ProteinEngineering