学位论文详细信息
Statistical Methods and Analysis to Identify Disease-Related Variants from Genetics Studies
Statistical genetics;Genetics;Science;Bioinformatics
Chen, SaiLi, Jun ;
University of Michigan
关键词: Statistical genetics;    Genetics;    Science;    Bioinformatics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/138617/saichen_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Advances in genotyping and sequencing technologies have greatly revolutionized the analytic methods in genetics research. Due to the dramatically decreasing per-genotype cost, millions of variants have been detected and genotyped from population-scale data. The findings provide a new insight into the human genome, and are continuously shaping our understanding of the genetic basis for disease. In this dissertation, I focus on three topics related to discovering disease-related variants in genetics studies in the aspects of method development and dataset analysis.In chapter 2, I develop a likelihood-based method, LIME, to detect and genotype mobile element insertions (MEIs), a specific type of large insertions, from sequencing data. The method generates genotype likelihoods for each MEI using simulation that mimics the distribution of reads in regions with and without MEIs From both simulated and real sequence data, our method shows better sensitivity than existing methods, especially in low-coverage data.In chapter 3, I present genome-wide association studies and a whole-genome sequencing effort of discovering potentially novel loci for colorectal cancer. Using an imputation-based meta-analysis strategy, I replicate many previous findings and provide a list of novel variants and genes for colorectal cancer. In collaboration with Fred Hutch Cancer Research Center, we additionally sequenced ~3,000 individuals and generated a variant call set. By incorporating gene annotation, sequence function prediction and online gene expression database, I highlight potentially functional loci for colorectal cancer in the known region 12q12 and the novel region 6q21.31. Although it is difficult to obtain new significant variants in the absence of extremely large dataset, our analysis provides some practical examples to incorporate functional genomics data into association analysis and to prioritize potentially functional candidates under limited sample size. Additionally, from the variant calling of whole-genome sequencing samples, we identified over 50 million variants, half of them being novel to the dbSNP database.In chapter 4, I describe a major update to the meta-analysis software RAREMETAL that brings in software engineering improvements and several useful new methods for rare variant analysis. The engineering improvements make RAREMETAL more computationally efficient. The new methods in addition preserve the ability to meta-analysis in unbalanced studies, multi-allelic sites and generalized linear mixed models.

【 预 览 】
附件列表
Files Size Format View
Statistical Methods and Analysis to Identify Disease-Related Variants from Genetics Studies 13513KB PDF download
  文献评价指标  
  下载次数:17次 浏览次数:19次