期刊论文详细信息
BMC Bioinformatics
Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
Ondrej Libiger1  Vikas Bansal2 
[1]Current address: MD Revolution, San Diego, CA, USA
[2]Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla 92037, CA, USA
关键词: BFGS algorithm;    Ancestry;    Maximum likelihood;    Allele frequencies;    High-throughput sequencing;    Admixture estimation;   
Others  :  1089020
DOI  :  10.1186/s12859-014-0418-7
 received in 2014-10-08, accepted in 2014-12-10,  发布年份 2015
PDF
【 摘 要 】

Background

Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads.

Results

We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling.

Conclusions

Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix webcite.

【 授权许可】

   
2015 Bansal and Libiger; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150123011446704.pdf 751KB PDF download
Figure 2. 21KB Image download
Figure 1. 63KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Menozzi P, Piazza A: Luca: The History and Geography of Human Genes. Princeton University Press, Princeton, NJ; 1994.
  • [2]Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al.: Genetic structure of human populations. Science 2002, 298(5602):2381-5.
  • [3]Tang H, Jorgenson E, Gadde M, Kardia SL, Rao DC, et al.: Racial admixture and its impact on BMI and blood pressure in African and Mexican Americans. Hum Genet 2006, 119(6):624-33.
  • [4]Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 2003, 361(9357):598-604.
  • [5]Marchini J, Cardon LR, Phillips MS, Donnelly P: The effects of human population structure on large genetic association studies. Nat Genet 2004, 36(5):512-7.
  • [6]Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010, 11(7):459-63.
  • [7]Pritchard JK, Stephens M, Donnelly P.: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945-59.
  • [8]Tang H, Peng J, Wang P, Risch NJ: Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 2005, 28(4):289-301.
  • [9]Alexander DH, Novembre J, Lange K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009, 19(9):1655-64.
  • [10]Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006; 2(12):e190.
  • [11]Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al.: Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008, 319(5866):1100-4.
  • [12]Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, et al.: Integrating common and rare genetic variation in diverse human populations. Nature 2010, 467(7311):52-8.
  • [13]Nelson MR, Bryc K, King KS, Indap A, Boyko AR, Novembre J, et al.: The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am J Hum Genet 2008, 83(3):347-58.
  • [14]Xing J, Watkins WS, Witherspoon DJ, Zhang Y, Guthery SL, Thara R, et al.: Fine-scaled human genetic structure revealed by SNP microarrays. Genome Res 2009, 19(5):815-25.
  • [15]Xing J, Watkins WS, Shlien A, Walker E, Huff CD, Witherspoon DJ, et al.: Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping. Genomics 2010, 96(4):199-210.
  • [16]Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics. 2011; 12:246.
  • [17]Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, et al.: Exome sequencing and the genetic basis of complex traits. Nat Genet 2012, 44(6):623-30.
  • [18]Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 2012, 491(7422):56-65.
  • [19]Guo Y, Long J, He J, Li CI, Cai Q, Shu XO, et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics. 2012; 13:194.
  • [20]Skotte L, Korneliussen TS, Albrechtsen A: Estimating individual admixture proportions from next generation sequencing data. Genetics 2013, 195(3):693-702.
  • [21]Hu Y, Willer C, Zhan X, Kang HM, Abecasis GR: Accurate local-ancestry inference in exome-sequenced admixed individuals via off-target sequence reads. Am J Hum Genet 2013, 93(5):891-9.
  • [22]Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew EY, et al.: Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 2014, 46(4):409-15.
  • [23]Nocedal J, Wright SJ. Numerical optimization: Springer; 2000. [http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0387987932]
  • [24]Byrd R, Lu P, Nocedal J, Zhu C: A Limited Memory Algorithm for Bound Constrained Optimization. SIAM J Sci Comput 1995, 16(5):1190-208. [http://epubs.siam.org/doi/abs/10.1137/0916069]
  • [25]Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18(11):1851-8.
  • [26]Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, et al.: SNP detection for massively parallel whole-genome resequencing. Genome Res 2009, 19(6):1124-32.
  • [27]Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, et al.: Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res 2010, 20(4):537-45.
  • [28]Bansal V, Tewhey R, Leproust EM, Schork NJ. Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS ONE. 2011; 6(3):e18353.
  • [29]Nejentsev S, Walker N, Riches D, Egholm M, Todd JA: Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 2009, 324(5925):387-9.
  • [30]Rivas MA, Beaudoin M, Gardet A, Stevens C, Sharma Y, Zhang CK, et al.: Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet 2011, 43(11):1066-73.
  • [31]Diogo D, Kurreeman F, Stahl EA, Liao KP, Gupta N, Greenberg JD, et al.: Rare, low-frequency, and common variants in the protein-coding sequence of biological candiyear genes from GWASs contribute to risk of rheumatoid arthritis. Am J Hum Genet 2013, 92:15-27.
  • [32]Scott-Van Zeeland AA, Bloss CS, Tewhey R, Bansal V, Torkamani A, Libiger O, et al.: Evidence for the role of EPHX2 gene variants in anorexia nervosa. Mol Psychiatry 2014, 19(6):724-32.
  • [33]Kim SY, Li Y, Guo Y, Li R, Holmkvist J, Hansen T, et al.: Design of association studies with pooled or un-pooled next-generation sequencing data. Genet Epidemiol 2010, 34(5):479-91.
  • [34]Eskin I, Hormozdiari F, Conde L, Riby J, Skibola CF, Eskin E, et al.: eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J Comput Biol 2013, 20(11):861-77.
  • [35]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D,: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559-75.
  • [36]Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 2009; 5(6):e1000519.
  • [37]Zhu C, Byrd RH, Lu P, Nocedal J: Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw 1997, 23(4):550-60.
  • [38]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.: The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25(16):2078-9.
  • [39]Libiger O, Schork NJ. A method for inferring an individual’s genetic ancestry and degree of admixture associated with six major continental populations. Front Genet. 2012; 3:322.
  • [40]Vieira FG, Fumagalli M, Albrechtsen A, Nielsen R: Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation. Genome Res. 2013, 23(11):1852-61.
  • [41]Gravel S, Zakharia F, Moreno-Estrada A, Byrnes JK, Muzzio M, Rodriguez-Flores JL, et al. Reconstructing Native American migrations from whole-genome and whole-exome data. PLoS Genet. 2013; 9(12):e1004023.
  • [42]Sankararaman S, Sridhar S, Kimmel G, Halperin E.: Estimating local ancestry in admixed populations. Am J Hum Genet 2008, 82(2):290-303.
  • [43]Tang H, Coram M, Wang P, Zhu X, Risch N.: Reconstructing genetic ancestry blocks in admixed individuals. Am J Hum Genet 2006, 79:1-12.
  • [44]Maples BK, Gravel S, Kenny EE, Bustamante CD: RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet 2013, 93(2):278-88.
  • [45]Brown R, Pasaniuc B. Enhanced methods for local ancestry assignment in sequenced admixed individuals. PLoS Comput Biol. 2014; 10(4):e1.003555.
  文献评价指标  
  下载次数:32次 浏览次数:20次