学位论文详细信息
Statistical Methods and Analysis in Next Generation Sequencing.
Next Generation Sequencing;Ancestral Inference;Age-related Macular Degeneration;Imputation;Targeted Sequencing;Genetic Association Studies;Public Health;Health Sciences;Biostatistics
Zhan, XiaoweiKang, Hyun Min ;
University of Michigan
关键词: Next Generation Sequencing;    Ancestral Inference;    Age-related Macular Degeneration;    Imputation;    Targeted Sequencing;    Genetic Association Studies;    Public Health;    Health Sciences;    Biostatistics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/107129/zhanxw_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Next generation sequencing (NGS) is a technology that advances our knowledge of human medical genetics with unprecedented amount of data. This vast amount of data presents challenges to existing statistical methods.In this dissertation, I present three studies that demonstrate methods for efficiently analyzing NGS data using both simulated and real data. In the first study, I develop ancestry inference method using small amounts of sequence data. In comparison to microarray experiments, sequencing data produce uneven coverage and genotypes with higher error rates than those traditionally used for principal components analysis (PCA) of genetic ancestry. I overcome some of these challenges using a novel statistical method modeling sequence data directly without relying on intermediate genotype calls.My method achieves high accuracy in simulated data based on the Human Genome Diversity Panel as well as in a targeted sequencing study of age related macular degeneration. In our age-related macular degeneration study, our approach helps discover a high-risk rare variant in the Complement 3 gene. In the second chapter, I develop a model-based ancestry inference method that improves upon previous work described in the first study. It is based on a likelihood-based model of ancestral location, using sequencing data as input. Without losing accuracy, it increases computational efficiency. For each sample, a parallelizable optimization algorithm can infer ancestry using a fraction of the computational resources required for PCA-based methods. Evaluation using in the Human Genome Diversity Panel and age-related macular degeneration data set demonstrates its accuracy and efficiency.In the final study, I develop an improved genotype call method for low-coverage sequencing data. As high quality reference panels grow, it is helpful to incorporate these into genotype calling of new samples.Using a coalescent based simulation and real data from the 1000 Genomes Project, I evaluate the utility of my method (which uses a panel of previously sequenced samples) to improve analyses of samples sequenced at various depths. The improvement in accuracy and computation time will be measured as a function of reference panel size. This work will be useful to investigators undertaking sequencing and analysis of new human samples.

【 预 览 】
附件列表
Files Size Format View
Statistical Methods and Analysis in Next Generation Sequencing. 5073KB PDF download
  文献评价指标  
  下载次数:27次 浏览次数:49次