学位论文详细信息
Development and application of computational tools for metabolic engineering
Machine Learning, Automation, Synthetic Biology, Metabolic Engineering, Pathway Optimization
Hamedi Rad, Sam
关键词: Machine Learning, Automation, Synthetic Biology, Metabolic Engineering, Pathway Optimization;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/101318/HAMEDIRAD-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
Modeling and computational tools are proven to be extremely successful in streamlining the design and engineering of different systems in many areas of science and engineering. However, due to its complex nature, modeling in biology is in an early stage and many more years of development is required to achieve the level of confidence that exists today on software packages like SOLIDWORKS suite and COMSOL on models in biology. For instance, the timescale of reliable modeling of a small protein is in the nano to microsecond range even with the most sophisticated supercomputers, let alone modeling of entire microorganisms in timescales of a few hours. One of the ways to overcome this limitation is to treat any specific system like a black box and increase the number of inputs or evaluations and learn the system by observing the output corresponding to each input. This solution requires large amounts of data to achieve a reliable predictive model of the system, which may be acquired where data acquisition is inexpensive and high-throughput screening methods are available. However, when the evaluations are expensive, other tools like automation or machine learning algorithms could be used to both reduce the cost of the evaluations and get more information with fewer evaluations. Here, computational tools and methods were developed to enable the generation of large scale inputs to the bio systems and provide useful insights from the output. For cases with difficult evaluations, a machine learning algorithm was used for choosing the most informative inputs to generate more insight from fewer experiments and evaluations.First, xylose utilization was chosen to show the effectiveness of large scale screening to improve this phenotype. Xylose is a major component of lignocellulosic biomass, one of the most abundant feedstocks for biofuel production. Therefore, efficient and rapid conversion of xylose to ethanol is crucial in the viability of lignocellulosic biofuel plants. Here, RNAi Assisted Genome Evolution (RAGE) was used to improve the xylose utilization rate in SR8, one of the most efficient publicly available xylose utilizing Saccharomyces cerevisiae strains. To identify gene targets for further improvement, we created a genome-scale library consisting of both genetic over-expression and down-regulation mutations in SR8. Followed by screening in media containing xylose as the sole carbon source, yeast mutants with 29% faster xylose utilization and 45% higher ethanol productivity were obtained relative to the parent strain. Two known and two new effector genes were identified in these mutant strains. Notably, down-regulation of CDC11, an essential gene, resulted in faster xylose utilization, and this gene target cannot be identified in genetic knock-out screens.This type of large-scale screening is only possible where a high-throughput screening method is available which was the case for xylose utilization. Unfortunately, most phenotypes lack this type of high-throughput assay and each evaluation is relatively expensive and time-consuming, making the evaluation of millions of assays practically impossible. This problem can be solved by taking advantage of machine learning algorithms, automation as well as a smart library design. A successful example of reducing the number of evaluations is presented here. The general approach of this methods is engineering the regulatory elements, typically promoters, to modify expression levels of the genes involved in the biosynthetic pathway and then evaluating some points in the expression production landscape and using a computational model to estimate the rest of the landscape. The entire process was performed using the Illinois Biological Foundry for Advanced Biomanufacturing (iBioFAB) where the initial system was set up and the algorithm picks the points to be evaluated and returns them to the iBioFAB. iBioFAB then performed the experiments dictated by the algorithm and returned the results back to the algorithm to get the next experiments to perform. This machine-algorithm hybrid Robo-Scientist generated hypotheses, tested them and used the result as the basis for generating new hypotheses, keeping an eye on its goal of optimizing the biosystem of interest. Bayesian optimization with Gaussian Process as the prior was used for learning from the outputs of the expression-production landscape. Using three rounds of machine learning driven pathway optimization with evaluating less than 1% of the total possibilities, this Robo-Scientist outperformed random screening by 1.77 folds.Golden Gate assembly was used as the method of choice for construction of the plasmid library in this project. However, efficient and high-fidelity assembly was key to successful implementation of this workflow on iBioFAB. It has been shown that the DNA sequences used as linkers in Golden Gate assembly are an important factor in its efficiency. Here, a DNA assembly and sequencing scheme was designed and tested to assess the efficiency of different linkers for this application. 200 linker sets with sizes from 10 to 50 with high efficiency and fidelity were found and reported to be used to achieve high efficiency in Golden Gate assembly. These optimized linkers were also used to achieve scar-less Golden Gate assembly and BsaI removal. In this method, optimized overhangs are found in the areas next to the ends of the DNA parts and a BsaI recognition site is added to the optimized linkers using PCR amplification.For efficient implementation of these and similar projects, other computational tools are required to both design large scale libraries and smaller smart libraries that are adaptively designed based on the data acquired from the experiments. Two guide RNA design tools were developed and used to design more than one million guide RNAs and more than one hundred thousand of the designed guide RNAs were synthesized and used in multiple projects. These molecules performed a variety of functions from large-scale gene activation, interference and deletion to genome editing with base pair resolution. The iBioCAD design tool was also developed to enable large-scale library design and construction using a variety of methods, including the novel scar-less Golden Gate assembly method described above.Finally, to solve one of the main bottlenecks in automation of the DNA assembly, a high-throughput DNA separation device was designed and tested. This high-throughput and compact vertical gel electrophoresis system would enable separation of 48 samples at a time and is compatible with most automated liquid handling systems due to the use of standard formats. The vertical gel electrophoresis system was set up and tested but the robust device with automatic size fractionation and detection needs more software and hardware engineering which is out of the scope of this dissertation.
【 预 览 】
附件列表
Files Size Format View
Development and application of computational tools for metabolic engineering 2475KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:22次