学位论文详细信息
Methods for statistical and population genetics analyses.
Statistical Genetics;Population Genetics;Admixture Mapping;Site Frequency Spectrum Estimation;TagSNP Selection;Next Generation Sequence Read Remapping;Genetics;Statistics and Numeric Data;Health Sciences;Science;Biostatistics
Gopalakrishnan, Shyam S.Rosenberg, Noah A. ;
University of Michigan
关键词: Statistical Genetics;    Population Genetics;    Admixture Mapping;    Site Frequency Spectrum Estimation;    TagSNP Selection;    Next Generation Sequence Read Remapping;    Genetics;    Statistics and Numeric Data;    Health Sciences;    Science;    Biostatistics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/89609/gopalakr_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
Genetics studies have advanced rapidly, from candidate region studies to genome wide association studies (GWAS) and next generation sequencing projects. The emergence of new technologies has brought with it an array of statistical challenges. In this thesis, we propose methods for statistical and population genetics in our effort to better understand the underlying architecture of our genomes. GWAS rely on indirect association, testing a reduced set of representative markers (tagSNPs) instead of all variants present in the genome. In the first chapter, we propose a graph-based method to select the optimal set of tagSNPs. We apply our method to chromosome-wide data and show that it outperforms the widely used greedy approach, selecting fewer tagSNPs while maintaining high correlation with non-tagSNPs variants.Alignment to a reference sequence is an integral step in many sequencing studies. Multiply mapped reads, reads that align to multiple locations in the reference, are discarded from downstream analyses, resulting in a loss of information. We develop a Gibbs sampling approach to identify the true location of multiply mapped reads obtained from the alignment step. We validate our method using simulation studies. We use the improvement in variant discovery to quantify the effect of including multiply mapped reads in downstream analyses. In the third chapter, we explore the feasibility of admixture mapping, a population genetics tool, in identifying regions harboring rare susceptibility variants. We compare the power of admixture mapping to single marker association studies in detecting causal regions. We find that admixture mapping performs better over a wide range of risk allele frequencies.The site frequency spectrum (SFS) is an important summary statistic in population genetics, encompassing information on selection and demographic history. We show that estimates of the SFS obtained from genotype calling methods underestimate the number of rare variants, especially singletons and doubletons. We derive a maximum likelihood estimate for the SFS. We demonstrate that our method performs better than SFS obtained from genotype calling algorithms using both simulated and real data examples.
【 预 览 】
附件列表
Files Size Format View
Methods for statistical and population genetics analyses. 863KB PDF download
  文献评价指标  
  下载次数:17次 浏览次数:38次