学位论文详细信息
Statistical Methods for Analyzing Human Genetic Variation in Diverse Populations.
Allelic Dropout;EM Algorithm;Genetic Variation;Population Structure;Principal Components Analysis;Procrustes Analysis;Genetics;Science;Bioinformatics
Wang, ChaolongBurmeister, Margit ;
University of Michigan
关键词: Allelic Dropout;    EM Algorithm;    Genetic Variation;    Population Structure;    Principal Components Analysis;    Procrustes Analysis;    Genetics;    Science;    Bioinformatics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/96024/chaolong_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

The recent expansion of genetic datasets in diverse populations has allowed researchers to investigate human genetic structure and evolutionary history with unprecedented resolution. The huge amount of data also poses new statistical challenges, in both quality control and data analysis. In this dissertation, I develop statistical methods to address some challenges arising from recent population-genetic studies, and apply the methods to study the geographic structure of human genetic variation.First, I develop a method to correct for allelic dropout, a common source of genotyping error in microsatellite data. Traditional solutions for allelic dropout often require replicate genotyping, which is costly and often impossible in population-genetic studies. To address this problem, I propose a maximum likelihood approach to estimate dropout rates from nonreplicated microsatellite genotypes. Based on simulations and empirical data, I show that this method is both accurate and fairly robust to some violations of model assumptions. Next, I introduce a Procrustes analysis approach to compare spatial maps of genetic variation. Multivariate techniques, such as principal components analysis (PCA), have been widely used to summarize population structure, typically in two-dimensional maps, which often resemble the geographic maps of sampling locations. Using the Procrustes approach, I quantitatively demonstrate that genetic coordinates based on SNPs and CNVs are similar to each other, and are highly concordant with the geographic coordinates.Finally, applying PCA and Procrustes analysis on SNP data from worldwide populations, I perform a systematic study to compare genes and geography across the globe. By considering examples in different regions, I find that significant similarity between genes and geography exists in general. Further, the similarity is highest in Asia and once isolated populations have been removed, Sub-Saharan Africa. The results provide a quantitative assessment of the geographic structure of human genetic variation worldwide.In summary, this dissertation contributes both statistical tools for analyzing large-scale genetic data and biological insights on the spatial patterns of human genetic variation. Results from this dissertation provide a basis for evaluating the role of geography in giving rise to human population structure, and can facilitate statistical methods for inferring individual geographic origin from genetic variation.

【 预 览 】
附件列表
Files Size Format View
Statistical Methods for Analyzing Human Genetic Variation in Diverse Populations. 6627KB PDF download
  文献评价指标  
  下载次数:31次 浏览次数:51次