期刊论文详细信息
BMC Genetics
Fast imputation using medium or low-coverage sequence data
Jeffrey R. O’Connell1  Chuanyu Sun2  Paul M. VanRaden3 
[1] University of Maryland School of Medicine, Baltimore 21201, Maryland, USA;National Association of Animal Breeders, Columbia 65205, Missouri, USA;Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville 20705-2350, MD, USA
关键词: Allele probability;    Sequence read depth;    Genotype;    Imputation;   
Others  :  1219401
DOI  :  10.1186/s12863-015-0243-7
 received in 2015-03-17, accepted in 2015-06-29,  发布年份 2015
PDF
【 摘 要 】

Background

Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and array genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updates posterior allele probabilities from prior probabilities within those two haplotypes as each individual’s sequence is processed. Directly using allele read counts can improve imputation accuracy and reduce computation compared with calling or computing genotype probabilities first and then imputing.

Results

A new algorithm was implemented in findhap (version 4) software and tested using simulated bovine and actual human sequence data with different combinations of reference population size, sequence read depth and error rate. Read depths of ≥8× may be desired for direct investigation of sequenced individuals, but for a given total cost, sequencing more individuals at read depths of 2× to 4× gave more accurate imputation from array genotypes. Imputation accuracy improved further if reference individuals had both low-coverage sequence and high-density (HD) microarray data, and remained high even with a read error rate of 16 %. With read depths of ≤4×, findhap (version 4) had higher accuracy than Beagle (version 4); computing time was up to 400 times faster with findhap than with Beagle. For 10,000 sequenced individuals plus 250 with HD array genotypes to test imputation, findhap used 7 hours, 10 processors and 50 GB of memory for 1 million loci on one chromosome. Computing times increased in proportion to population size but less than proportional to number of variants.

Conclusions

Simultaneous genotype calling from low-coverage sequence data and imputation from array genotypes of various densities is done very efficiently within findhap by updating allele probabilities within the two haplotypes for each individual. Accuracy of genotype calling and imputation were high with both simulated bovine and actual human genomes reduced to low-coverage sequence and HD microarray data. More efficient imputation allows geneticists to locate and test effects of more DNA variants from more individuals and to include those in future prediction and selection.

【 授权许可】

   
2015 VanRaden et al.

【 预 览 】
附件列表
Files Size Format View
20150717010817503.pdf 836KB PDF download
Fig. 4. 19KB Image download
Fig. 3. 39KB Image download
Fig. 2. 17KB Image download
Fig. 1. 37KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

【 参考文献 】
  • [1]An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491:56-65.
  • [2]Hayes B, Daetwyler H, Fries R, Stothard P, Pausch H, van Binsbergen R, et al. The 1000 bull genomes project. 2013. http://www.1000bullgenomes.com/doco/hayes_pag_1000bullgenomes_2013.pdf. Accessed 10 Feb 2015.
  • [3]Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brøndum RF et al.. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 2014; 46:858-65.
  • [4]Heidaritabar M, Calus MPL, Vereijken A, Groenen MAM, Bastiaansen JWM. High imputation accuracy in layer chicken from sequence data on a few key ancestors. Communication 660 in: Proceedings of the 10th world congress on genetics applied to livestock production. 2014. https://asas.org/docs/default-source/wcgalp-posters/660_paper_8829_manuscript_574_0.pdf. Accessed 11 Feb 2015.
  • [5]van Binsbergen R, Bink MCAM, Calus MPL, van Eeuwijk FA, Hayes BJ, Hulsegge I et al.. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle. Genet Sel Evol. 2014; 46:41. BioMed Central Full Text
  • [6]Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007; 81:1084-97.
  • [7]Howie B, Marchini J, Stephens M. Genotype imputation with thousands of genomes. G3. 2011; 1:457-70.
  • [8]Brøndum RF, Guldbrandtsen B, Sahana G, Lund MS, Su G. Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle. BMC Genomics. 2014; 15:728. BioMed Central Full Text
  • [9]Druet T, Macleod IM, Hayes BJ. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity. 2014; 112:39-47.
  • [10]Johnston J, Kistemaker G, Sullivan PG. Comparison of different imputation methods. Interbull Bull. 2011; 44:25-33.
  • [11]VanRaden PM, O’Connell JR, Wiggans GR, Weigel KA. Genomic evaluations with many more genotypes. Genet Sel Evol. 2011; 43:10. BioMed Central Full Text
  • [12]Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014; 15:478. BioMed Central Full Text
  • [13]Wiggans GR, VanRaden PM, Cooper TA. Rapid calculation of genomic evaluations for new animals. J Dairy Sci 2015;98:2039–42.
  • [14]VanRaden PM, Null DJ, Sargolzaei M, Wiggans GR, Tooker ME, Cole JB et al.. Genomic imputation and evaluation using high-density Holstein genotypes. J Dairy Sci. 2013; 96:668-78.
  • [15]Li H, Sargolzaei M, Schenkel F. Accuracy of whole-genome sequence genotype imputation in cattle breeds. Communication 667 in: proceedings of the 10th world congress on genetics applied to livestock production. 2014. https://asas.org/docs/default-source/wcgalp-posters/667_paper_9613_manuscript_1052_0.pdf. Accessed 18 Feb 2015.
  • [16]Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S et al.. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome. 2014; 7:3.
  • [17]O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M et al.. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014; 10: Article ID e1004234
  • [18]VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF et al.. Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009; 92:16-24.
  • [19]Cleveland MA, Hickey JM, Forni S. A common dataset for genomic analysis of livestock populations. G3. 2012; 2:429-35.
  • [20]Kim SY, Li Y, Guo Y, Li R, Holmkvist J, Hansen T et al.. Design of association studies with pooled or un-pooled next-generation sequencing data. Genet Epidemiol. 2010; 34:479-91.
  • [21]Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011; 21:940-51.
  • [22]Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H et al.. 2012. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet. 2012; 44:631-6.
  • [23]Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Meth. 2013; 10:5-6.
  • [24]Menelaou A, Marchini J. Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold. Bioinformatics. 2013; 29:84-91.
  • [25]Wang Y, Lu J, Yu J, Gibbs RA, Yu F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res. 2013; 23:833-42.
  • [26]Delaneau O, Marchini J. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat Commun. 2014; 5:3934.
  • [27]Duitama J, Kennedy J, Dinakar S, Hernández Y, Wu Y, Măndoiu II. Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads. BMC Bioinformatics. 2011; 12 Suppl. 1:S53. BioMed Central Full Text
  • [28]Wendl MC, Wilson RK. Aspects of coverage in medical DNA sequencing. BMC Bioinformatics. 2008; 9:239. BioMed Central Full Text
  • [29]Huang L, Wang B, Chen R, Bercovici S, Batzoglou S. Reveel: large-scale population genotyping using low-coverage sequencing data. bioRxiv 2014; doi:10.1101/011882.
  • [30]Gorjanc G, Cleveland MA, Houston RD, Hickey JM. Potential of genotyping-by-sequencing for genomic selection in livestock populations. Genet Select Evol. 2015; 47:12. BioMed Central Full Text
  • [31]Van Raden PM. findhap.f90, Find haplotypes and impute genotypes using multiple chip sets and sequence data. 2015. http://aipl.arsusda.gov/software/findhap/. Accessed 24 Feb 2015.
  • [32]Wiggans GR, VanRaden PM, Cooper TA. Technical note: rapid calculation of genomic evaluations for new animals. J Dairy Sci. 2015; 98:2039-42.
  • [33]Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010; 11:499-511.
  • [34]O’Connell J, Marchini J. Joint genotype calling with array and sequence data. Genet Epidemiol. 2012; 36:527-37.
  • [35]Calus MPL, Bouwman AC, Hickey JM, Veerkamp RF, Mulder HA. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal. 2014; 8:1743-53.
  • [36]Alkhoder H, Liu Z, Reinhardt F, Swalve HH, Reents R. Genomic evaluation with SNP chip switched. In: Book of Abstracts of the 63rd Annual Meeting of the European Federation of Animal Science. Wageningen, The Netherlands: Wageningen Academic Publishers; 2012;136.
  • [37]De Donato M, Peters SO, Mitchell SE, Hussain T, Imumorin IG. Genotyping-by-sequencing (GBS): a novel, efficient and cost-effective genotyping method for cattle using next-generation sequencing. PLoS ONE. 2013; 8: Article ID e62137
  • [38]Hickey JM. Sequencing millions of animals for genomic selection 2.0. J Anim Breed Genet. 2013; 130:331-2.
  文献评价指标  
  下载次数:20次 浏览次数:11次