Although the cost of high-throughput technologies has decreased dramatically, it is still expensive to obtain a large number of biological replicates. On the other hand, with the wide adoption of high-throughput biology, multiple related genomic datasests are often available. The first two chapters tackle the challenging problem of borrowing information across multiple datasets, allowing context specificity, and overcoming the exponential growth of parameter space simultaneously to improve signal detection for noisy genomic data. Chapter 1 proposes a flexible Bayesian hierarchical mixture model to capture the latent correlation structures embedded in the data, named as ;;correlation motifs;;, and utilizes that piece of information to improve signal detection. The application is illustrated by differential gene expression detection when the expression datasets have only a small number of replicate samples. Chapter 2 demonstrates that a generalized version of the correlation motif approach can also help detect allele-specific protein-DNA binding from ChIP-seq data, which often suffers from low statistical power due to the limited number of sequence reads mapped to heterozygote SNPs. For both cases, the correlation motif approach substantially improves signal detection for low-signal-to-noise ratio data.Moreover, the current high-throughput technologies such as immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip) for studying protein-DNA interactions are ;;high-throughput;; in terms of mapping a given type of transcription factor (TF) genome-widely. Nevertheless, mapping genome-wide binding sites of all TFs in all biological contexts is a critical step toward understanding gene regulation. From this perspective, ChIP-seq and ChIP-chip are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TF binding sites by analyzing chromatin features at computationally determined DNA motif sites for many TFs simultaneously. Chapter 3 compares different models and discusses various issues arising from this new approach.
【 预 览 】
附件列表
Files
Size
Format
View
Integrative Statistical Models for Genomic Signal Detection