学位论文详细信息
Computational Approaches for Analyzing High-Throughput Genomic Data
Statistical Genetics;High-throughput Data;Genetics;Health Sciences;Biostatistics
Lee, YejiScott, Laura Jean ;
University of Michigan
关键词: Statistical Genetics;    High-throughput Data;    Genetics;    Health Sciences;    Biostatistics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/147631/yejilee_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

With the improvement of high-throughput technologies, association studies related to molecular phenotypes have become increasingly significant. Associated genetic variants found from studies based on high-throughput omics experiments provide valuable information to help understand biological mechanisms behind complex traits.While analyses using high-throughput data can play a crucial role to study complex traits, many analytical challenges remain unresolved.This dissertation primarily focuses on two outstanding issues in genetic association analysis of high-throughput sequence data. First, when incorporating functional annotations into multi-SNP association analyses and the number of candidate SNPs increases, computational burden increases. Second, there is a need to identify reproducible signals between studies. Measuring reproducibility between assays in high-throughput experiments and association results between studies is crucial to assess the quality of the overall procedures and the association evidence.In Chapter 2, we propose an algorithm to incorporate functional annotations into Bayesian multi-SNP analysis based on a probabilistic hierarchical model. The proposed algorithm, name as deterministic approximation of posteriors (DAP), shows superior accuracy and computational efficiency over the existing methods, including Markov Chain Monte Carlo (MCMC) algorithms to fit a sparse Bayesian variable selection model. In Chapter 3, we propose a probabilistic quantification of association evidence, accounting for linkage disequilibrium (LD). By identifying a set of SNPs in LD and representing a single association signal, we are able to construct credible sets and perform appropriate false discovery rate (FDR) control in Bayesian multi-SNP association analysis. We also derive a set of sufficient summary statistics that lead to equivalent inference results as using individual-level data.In Chapter 4, we propose a set of computational methods to measure reproducibility among high-throughput sequencing experiments. In particular, we propose a statistical approach to take advantage of the fact that a strong and genuine signal is expected to show the same directionaleffects in multiple studies.We design a novel Bayesian hierarchical model and estimate the posterior probability of each testing unit (e,g, SNP) being reproducible under a proposed set of prior probabilities. We also propose visualization tools and quantification measures tool to assess the overall reproducibility among multiple experiments. In three chapters of the dissertation, we discuss several issues in studies utilizing high-throughput data and propose computational methods to deal with these issues.

【 预 览 】
附件列表
Files Size Format View
Computational Approaches for Analyzing High-Throughput Genomic Data 2245KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:23次