期刊论文详细信息
BMC Evolutionary Biology
Sampling strategies for frequency spectrum-based population genomic inference
Ryan N Gutenkunst3  Michael J Hickerson1  Alec J Coffman3  John D Robinson2 
[1]Division of Invertebrate Zoology, American Museum of Natural History, New York 10024, NY, USA
[2]Current Address: South Carolina Department of Natural Resources, Marine Resources Research Institute, Charleston 29412, SC, USA
[3]Department of Molecular and Cellular Biology, University of Arizona, Tucson 85721, AZ, USA
关键词: Parameter uncertainty;    Model selection;    SNP;    Demographic history;    Allele frequency spectrum;   
Others  :  1118358
DOI  :  10.1186/s12862-014-0254-4
 received in 2014-07-24, accepted in 2014-11-24,  发布年份 2014
PDF
【 摘 要 】

Background

The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.

Results

Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.

Conclusions

Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

【 授权许可】

   
2014 Robinson et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150206023049806.pdf 705KB PDF download
Figure 7. 56KB Image download
Figure 6. 26KB Image download
Figure 5. 30KB Image download
Figure 4. 41KB Image download
Figure 3. 8KB Image download
Figure 2. 9KB Image download
Figure 1. 68KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA: Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 2008, 3:e3376.
  • [2]Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE: Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 2012, 7:e37135.
  • [3]Sousa V, Hey J: Understanding the origin of species with genome-scale data: modelling gene flow. Nat Rev Genet 2013, 14:404-414.
  • [4]Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 2009, 5:e1000695.
  • [5]Naduvilezhath L, Rose LE, Metzler D: Jaatha: a fast composite-likelihood approach to estimate demographic parameters. Mol Ecol 2011, 20:2709-2723.
  • [6]Lukić S, Hey J: Demographic inference using spectral methods on SNP Data, with an analysis of the human out-of-Africa expansion. Genetics 2012, 192:619-639.
  • [7]Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M: Robust demographic inference from genomic and SNP data. PLoS Genet 2013, 9:e1003905.
  • [8]Tajima F: a Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123:585–595.
  • [9]Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C: Genomic scans for selective sweeps using SNP data. Genome Res 2005, 15:1566-1575.
  • [10]Pavlidis P, Jensen JD, Stephan W: Searching for footprints of positive selection in whole-genome SNP data from non-equilibrium populations. Genetics 2010, 185:907-922.
  • [11]Singh ND, Jensen JD, Clark AG, Aquadro CF: Inferences of demography and selection in an African population of D. melanogaster. Genetics 2013, 193:215-228.
  • [12]Nielsen R, Hubisz MJ, Hellmann I, Torgerson D, Andrés AM, Albrechtsen A, Gutenkunst R, Adams MD, Cargill M, Boyko A, Indap A, Bustamante CD, Clark AG: Darwinian and demographic forces affecting human protein coding genes. Genome Res 2009, 19:838-849.
  • [13]Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo ZXP, Pool JE, Xu X, Jiang H, Vinckenbosch N, Korneliussen TS, Zheng H, Liu T, He W, Li K, Luo R, Nie X, Wu H, Zhao M, Cao H, Zou J, Shan Y, Li S, Yang Q, Asan, Ni P, Tian G, Xu J, Liu X, Jiang T, Wu R, et al.: Sequencing of 50 human exomes reveals adaptation to high altitude.Science 2010, 329:75–8.
  • [14]Xing J, Watkins WS, Hu Y, Huff CD, Sabo A, Muzny DM, Bamshad MJ, Gibbs RA, Jorde LB, Yu F: Genetic diversity in India and the inference of Eurasian population expansion. Genome Biol 2010, 11:R113. BioMed Central Full Text
  • [15]Murray C, Huerta-Sanchez E, Casey F, Bradley DG: Cattle demographic history modelled from autosomal sequence variation. Philos Trans R Soc Lond B Biol Sci 2010, 365:2531-2539.
  • [16]Molina J, Sikora M, Garud N, Flowers JM, Rubinstein S, Reynolds A, Huang P, Jackson S, Schaal BA, Bustamante CD, Boyko AR, Purugganan MD: Molecular evidence for a single evolutionary origin of domesticated rice. Proc Natl Acad Sci U S A 2011, 108:8351-8356.
  • [17]Lozier JD: Revisiting comparisons of genetic diversity in stable and declining species: assessing genome-wide polymorphism in North American bumble bees using RAD sequencing. Mol Ecol 2014, 23:788-801.
  • [18]Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, Bustamante CD: Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA 2005, 102:7882-7887.
  • [19]Adams AM, Hudson RR: Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 2004, 168:1699-1712.
  • [20]Keinan A, Clark AG: Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 2012, 336:740-743.
  • [21]Akaike H: Information Theory and An Extension of the Maximum Likelihood Principle. In Second Int Symp Inf Theory. Edited by Petrov B, Csaki F. Academiai Kiado, Budapest; 1973:267-281.
  • [22]Burnham K, Anderson D: Kullback–Leibler information as a basis for strong inference in ecological studies. Wildl Res 2001, 28:111-119.
  • [23]Myers S, Fefferman C, Patterson N: Can one learn history from the allelic spectrum? Theor Popul Biol 2008, 73:342-348.
  • [24]Beerli P: Estimation of migration rates and population sizes in geographically structured populations. In Adv Mol Ecol. Edited by Carvalho GR. Amsterdam: IOS Press; 1998:39–53.
  • [25]Whitlock MC, McCauley DE: Indirect measures of gene flow and migration: FST not equal to 1/(4Nm +1). Heredity 1999, 82:117-125.
  • [26]Marko PB, Hart MW: The complex analytical landscape of gene flow inference. Trends Ecol Evol 2011, 26:448-456.
  • [27]Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM: Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012, 337:64-69.
  • [28]Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002, 18:337-338.
  • [29]Mccoy RC, Garud NR, Kelley JL, Boggs CL, Petrov DA: Genomic inference accurately predicts the timing and severity of a recent bottleneck in a non-model insect population. Mol Ecol 2014, 23:136-150.
  • [30]Wiuf C: Consistency of estimators of population scaled parameters using composite likelihood. J Math Biol 2006, 53:821-841.
  • [31]Varin C, Vidoni P: A note on composite likelihood inference and model selection. Biometrika 2005, 92:519-528.
  • [32]Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 2011, 12:443-451.
  • [33]Crawford JE, Lazzaro BP: Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Front Genet 2012, 3:66.
  • [34]Nielsen R, Korneliussen T, Albrechtsen A, Li Y, Wang J: SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PLoS One 2012, 7:e37558.
  • [35]Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A: Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 2012, 29:1917-1932.
  文献评价指标  
  下载次数:58次 浏览次数:13次