Source Code for Biology and Medicine | |
AdmixKJump: identifying population structure in recently diverged groups | |
Timothy D O’Connor1  | |
[1] Institute for Genome Sciences, Program in Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, 801 W Baltimore St, Baltimore 21201, MD, USA | |
关键词: Fine scale population structure; 1000 Genomes project; Population genetics; Admixture; | |
Others : 1139262 DOI : 10.1186/s13029-014-0031-1 |
|
【 摘 要 】
Motivation
Correctly modeling population structure is important for understanding recent evolution and for association studies in humans. While pre-existing knowledge of population history can be used to specify expected levels of subdivision, objective metrics to detect population structure are important and may even be preferable for identifying groups in some situations. One such metric for genomic scale data is implemented in the cross-validation procedure of the program ADMIXTURE, but it has not been evaluated on recently diverged and potentially cryptic levels of population structure. Here, I develop a new method, AdmixKJump, and test both metrics under this scenario.
Findings
I show that AdmixKJump is more sensitive to recent population divisions compared to the cross-validation metric using both realistic simulations, as well as 1000 Genomes Project European genomic data. With two populations of 50 individuals each, AdmixKJump is able to detect two populations with 100% accuracy that split at least 10KYA, whereas cross-validation obtains this 100% level at 14KYA. I also show that AdmixKJump is more accurate with fewer samples per population. Furthermore, in contrast to the cross-validation approach, AdmixKJump is able to detect the population split between the Finnish and Tuscan populations of the 1000 Genomes Project.
Conclusion
AdmixKJump has more power to detect the number of populations in a cohort of samples with smaller sample sizes and shorter divergence times.
Availability
A java implementation can be found at https://sites.google.com/site/igsevolgenomicslab/home/downloads webcite
【 授权许可】
2015 O’Connor; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150321091425317.pdf | 398KB | download | |
Figure 1. | 42KB | Image | download |
【 图 表 】
Figure 1.
【 参考文献 】
- [1]O’Connor TD, Kiezun A, Bamshad M, Rich SS, Smith JD, Turner E, et al.: Fine-scale patterns of population stratification confound rare variant association tests. PloS one 2013, 8(7):65834.
- [2]Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945-59.
- [3]Alexander DH, Novembre J, Lange K: Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009, 19(9):1655-64.
- [4]Alexander DH, Lange K: Enhancements to the admixture algorithm for individual ancestry estimation. BMC Bioinformatics 2011, 12(1):246. BioMed Central Full Text
- [5]Sugar CA, James GM: Finding the number of clusters in a dataset. J Am Stat Assoc 2003, 98(463):750-63.
- [6]Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, et al.: Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012, 337(6090):64-9.
- [7]Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, et al.: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 2012, 493(7431):216-20.
- [8]Ewing G, Hermisson J. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2064; 26(16).
- [9]1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes Nature 2012, 491:56-65.
- [10]Workman P, Mielke J, Nevanlinna H: The genetic structure of finland. Am J Phys Anthropol 1976, 44(2):341-67.