Epigenetic changes are chemical and structural modifications of DNA and its associated proteins which do not change DNA sequence. These modifications mark and package DNA in different ways and help to establish cell types, which are distinct and heritable gene expression states. Understanding determinants of epigenetic modifications and how these modifications affect gene expression is a major challenge with important implications in developmental biology and medicine. Meeting this challenge requires methods for predicting biologically relevant events from a large number of degrees of freedom that interact via unknown rules.The nature of this problem along with large amounts of data provided by high-throughput sequencing techniques motivates a machine-learning approach.This work uses artificial neural networks to predict binding of proteins involved in 3-dimensional organization of DNA as well as locations of methylation marks deposited by DNA methyltransferase enzymes.To understand the rules underlying the sequence-based prediction of our models, we apply interpretation methods based on sampling from constrained maximum entropy distributions.We consider biological and biophysical implications of the important sequence patterns revealed by interpretation. In the case of DNA methylation, our statistical methods help understand how methylation affects gene expression as well as how cells in our engineered yeast system response to DNA methylation stress.Finally, we study the diversity of single-cell gene expression across the cell types of the human skin and demonstrate coordination between changes in epigenetically modified loci andchanges in expression of transcription factor proteins predicted to bind these loci.
【 预 览 】
附件列表
Files
Size
Format
View
Understanding machine learning models of the epigenome with statistics and statistical mechanics