The advent of high throughput technologies has enabled large-scale measurements of the genome,transcriptome, proteome and metabolome of tissues samples, serum and even single cells.Additionally, prior biological knowledge is increasingly curated into accessible databases andreconstructed into computable models. My research aims to integrate high throughput data and priorknowledge to improve disease diagnosis and our understanding of biological systems, by leveragingthe power of both statistical learning and mechanistic modeling approaches.The first part of my Ph.D. work is to apply increasingly mechanistic biological constraints in in theanalysis of high throughput gene expression data to identify molecular signatures of diseasephenotypes. Chapter 2 discusses the statistical issues and recommended steps to generate accurate andreproducible molecular signatures. Chapter 3 presents a new computational method that uses therelative expression level of interacting gene pairs as accurate molecular signatures. By incorporatingprior knowledge about the relations between genes, this method increases molecular signaturereproducibility compared with previous methods.Metabolic networks reconstructed from known reaction stoichiometry and gene-protein-reactionassociations provide a mechanistic context to analyze gene expression data. In Chapter 4, I developeda new analysis pipeline that identified perturbations at metabolic branch points (i.e., structures wheretwo reactions consume the same metabolite). Different phenotypes (e.g., cancer v.s. normal) can beaccurately distinguished by transcriptional changes at metabolic branch points. Combining reactionexpression state (high/low), mass conservation and thermodynamic constraints, I identified additionalperturbed branch point reaction pairs that are not apparent from expression data alone.The second part of my PhD work is to contextualize and refine prior knowledge by integration withcontext-specific high throughput data. In Chapter 5, I developed a novel computational methodmCADRE to reconstruct tissue-specific metabolic models. This method can use transcriptomic,proteomic and metabolomics data to infer the metabolic network of a given tissue or cell type. Thisiiimethod can be viewed as using tissue-specific omic data to refine and contextualize prior knowledgeof metabolism. Using this new method, I reconstructed genome-scale metabolic models for 126human tissues, providing a tissue-specific encyclopedia of metabolism. In Chapter 6, I appliedmCADRE to reconstruct metabolic networks of commonly used breast cancer cell lines. Systematiccomparison of model prediction and experimental results revealed different types of inconsistenciesthat call for further model curation and the development of new modeling approaches.
【 预 览 】
附件列表
Files
Size
Format
View
Integrating statistical and mechanistic modeling to analyze disease omic data