学位论文详细信息
Cis-regulatory module analysis: inferring regulatory networks and underlying mechanisms
Cis-regulatory module;enhancer;transcription factor;transcription factor (TF) interaction;interacting TF signatures (iTFs)
Kazemian, Abdol Majid
关键词: Cis-regulatory module;    enhancer;    transcription factor;    transcription factor (TF) interaction;    interacting TF signatures (iTFs);   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/42177/Abdol%20Majid_Kazemian.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

A major challenge in understanding metazoan genomes is to find and annotate the regions that control the precise spatial and temporal expression of the genes. Cis- regulatory modules (CRMs/enhancers), main players of this regulatory process, are typically short (<1kb) sequences that are embedded in non-coding regions of the genome. They harbor cis-elements (binding sites/motifs) for one or more related transcription factors (TFs) and mediate a discrete aspect of the expression pattern of their nearby gene. Although decades of research in biology have provided scientists with hundreds of such sequences, we are far from completing the search and understanding the underlying mechanisms of these regulatory regions. The goal of this thesis is to utilize computational and statistical methods to guide the search for novel CRMs, reveal the mechanisms of this regulatory action, and elucidate specific biological networks using the developed methodology.The first part of my thesis develops several statistical methods to find novel enhancers using the existing enhancers as training data. The current computational enhancer prediction methods rely on the prior knowledge of relevant transcription factors. We introduce a novel computational paradigm to enhancer discovery in the common scenario where relevant transcription factors and/or motifs are unknown. Beginning with a small set of enhancers mediating a common gene expression pattern, our methods search genome-wide for enhancers with similar functionality. Our methods employ word- based statistical and machine learning techniques and do not require (or rely on) known motifs or accurate motif discovery. We use these approaches to a wide range of less- studied networks in fruit fly and mouse.The second part my thesis develops a qualitative model to predict the function of enhancers. A long-standing question in transcriptional gene regulation is how a gene’s sequence encodes its expression (function). In fruit flies, the segmentation of their body plan over the anterior-posterior (A/P) axis is achieved through a well-characterized transcriptional regulatory network that consists of several known enhancers. Using these enhancers as training data, we learn a generalized linear model that combines the relevant TF occupancies (the product of TF binding strength with their correspondingiiconcentrations) to predict their function. We show that this model can capture the physical roles (activation or repression) of transcription factors as well as predict enhancer function. We use this model to scan the fly genome for segments that drive an A/P pattern similar to that of their neighboring genes and construct a quantitative network of fruit fly embryo anteroposterior patterning.The third part of my thesis develops a model to simultaneously locate the enhancers and annotate the expression pattern driven by them. The model does not rely on already characterized enhancers. Thus in a sense, it can be thought of as an extension to the second project where the knowledge of enhancers was available. The model iteratively samples a “more reliable” set of enhancers from a large pool of computationally predicted enhancers and re-learns a “more reliable” logistic regression model from these enhancers, ready to be used in the next iteration of enhancer sampling. In other words, by defining an objective function as “how well enhancers recapitulate one or more aspects of their nearby gene expression pattern”, we iteratively sample from a collection of candidate enhancers to maximize this objective function.The last part of my thesis develops a statistical framework for finding sequence signatures of TF-TF interaction. We search for two types of sequence signatures: overlap/depletion among the bound regions of pair of transcription factors, and orientation and/or distance bias among transcription factor binding sites. These sequence signatures explain various distinct mechanisms of combinatorial gene regulation, such as protein-protein interaction, short-range repression, and co-regulation. These signatures as a set of informative features can also advance the methods for discovering enhancers and predicting their functions. We use our framework to search genome-wide for these signatures among a large collection of characterized TFs (>300) in fruit fly.

【 预 览 】
附件列表
Files Size Format View
Cis-regulatory module analysis: inferring regulatory networks and underlying mechanisms 11839KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:8次