学位论文详细信息
Integrative Statistical Models for Genomic Signal Detection
Bayes hierarchical model;EM algorithm;Multiple datasets;Data integration;Transcription factor;Allele-specific binding;Microarray;Next generation sequencing;not listed
Wei, YingyingScott, Alan L. ;
Johns Hopkins University
关键词: Bayes hierarchical model;    EM algorithm;    Multiple datasets;    Data integration;    Transcription factor;    Allele-specific binding;    Microarray;    Next generation sequencing;    not listed;   
Others  :  https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/37932/WEI-DISSERTATION-2014.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: JOHNS HOPKINS DSpace Repository
PDF
【 摘 要 】
Although the cost of high-throughput technologies has decreased dramatically, it is still expensive to obtain a large number of biological replicates. On the other hand, with the wide adoption of high-throughput biology, multiple related genomic datasests are often available. The first two chapters tackle the challenging problem of borrowing information across multiple datasets, allowing context specificity, and overcoming the exponential growth of parameter space simultaneously to improve signal detection for noisy genomic data. Chapter 1 proposes a flexible Bayesian hierarchical mixture model to capture the latent correlation structures embedded in the data, named as ;;correlation motifs;;, and utilizes that piece of information to improve signal detection. The application is illustrated by differential gene expression detection when the expression datasets have only a small number of replicate samples. Chapter 2 demonstrates that a generalized version of the correlation motif approach can also help detect allele-specific protein-DNA binding from ChIP-seq data, which often suffers from low statistical power due to the limited number of sequence reads mapped to heterozygote SNPs. For both cases, the correlation motif approach substantially improves signal detection for low-signal-to-noise ratio data.Moreover, the current high-throughput technologies such as immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip) for studying protein-DNA interactions are ;;high-throughput;; in terms of mapping a given type of transcription factor (TF) genome-widely. Nevertheless, mapping genome-wide binding sites of all TFs in all biological contexts is a critical step toward understanding gene regulation. From this perspective, ChIP-seq and ChIP-chip are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TF binding sites by analyzing chromatin features at computationally determined DNA motif sites for many TFs simultaneously. Chapter 3 compares different models and discusses various issues arising from this new approach.
【 预 览 】
附件列表
Files Size Format View
Integrative Statistical Models for Genomic Signal Detection 9453KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:46次