期刊论文详细信息
BMC Genomics
A new approach for efficient genotype imputation using information from relatives
Flavio S Schenkel2  Jacques P Chesnais1  Mehdi Sargolzaei1 
[1] Semex Alliance, 130 Stone Road West, Guelph, ON, Canada;Centre for Genetic Improvement of Livestock, Animal and Poultry Science Department, University of Guelph, 50 Stone Road East, Guelph, ON, Canada
关键词: Sliding window;    Rare variant;    Haplotype;    Imputation;    Family;   
Others  :  1089787
DOI  :  10.1186/1471-2164-15-478
 received in 2013-08-20, accepted in 2014-06-10,  发布年份 2014
PDF
【 摘 要 】

Background

Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.

Results

The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.

Conclusions

The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.

【 授权许可】

   
2014 Sargolzaei et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150128151825862.pdf 1097KB PDF download
Figure 4. 18KB Image download
Figure 3. 76KB Image download
Figure 2. 105KB Image download
Figure 1. 136KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Nejati-Javaremi A, Smith C, Gibson JP: Effect of total allelic relationship on accuracy of evaluation and response to selection. J Anim Sci 1997, 75:1738-1745.
  • [2]Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157:1819-1829.
  • [3]Schaeffer LR: Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet 2006, 123:1-6.
  • [4]Van der Werf JHJ: Potential benefit of genomic selection in sheep. Proc Assoc Advanc Anim Genetics 2009, 18:38-41.
  • [5]Hayes BJ, Bowman PJ, Daetwyler HD, Kijas JW, van der Werf JHJ: Accuracy of genotype imputation in sheep breeds. Anim Genet 2011, 43:72-80.
  • [6]Li L, Li Y, Browning SR, Browning BL, Slater AJ, Kong X, Aponte JL, Mooser VE, Chissoe SL, Whittaker JC, Nelson MR, Ehm MG: Performance of genotype imputation for rare variants identified in exons and flanking regions of genes. PLoS One 2011, 6(9):e24945. doi:10.1371/journal.pone.0024945
  • [7]VanRaden PM, Wiggans GR, Van Tassell CP, Sonstegard TS, Schenkel FS: Benefits from cooperation in genomics. Interbull Bull 2009, 39:67-72.
  • [8]Marchini J, Howie B: Genotype imputation for genome-wide association studies. Nat Rev Genet 2010, 11:499-511.
  • [9]Li Y, Willer CJ, Sanna S, Abecasis GR: Genotype imputation. Annu Rev Genomics Hum Genet 2009, 10:387-406.
  • [10]Burdick JT, Chen WM, Abecasis GR, Cheung VG: In silico method for inferring genotypes in pedigrees. Nat Genet 2006, 38:1002-1004.
  • [11]Kong A, Masson G, Frigge ML, Gylfason A, Zusmanovich P, Thorleifsson G, Olason PI, Ingason A, Steinberg S, Rafnar T, Sulem P, Mouy M, Jonsson F, Thorsteinsdottir U, Gudbjartsson DF, Stefansson H, Stefansson K: Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet 2008, 40(9):1068-1075.
  • [12]Browning B, Browning S: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 2009, 84:210-223.
  • [13]Howie BN, Donnelly P, Marchini J: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 2009, 5(6):e1000529.
  • [14]Browning SR, Browning BL: Haplotype phasing: existing methods and new developments. Nat Rev Genet 2011, 12:703-714.
  • [15]Daetwyler HD, Wiggans GR, Hayes BJ, Woolliams JA, Goddard ME: Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 2011, 189:317-327.
  • [16]Meuwissen THE, Goddard ME: The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole genome sequence density genotypic data. Genetics 2010, 185:441-449.
  • [17]Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM: Finding the missing heritability of complex diseases. Nature 2009, 461:747-753.
  • [18]Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005, 6:95-108.
  • [19]Howie B, Marchini J, Stephens M: Genotype imputation with thousands of genomes. G3 (Bethesda) 2011, 1(6):457-470.
  • [20]Cirulli ET, Goldstein DB: Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010, 11:415-425.
  • [21]Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O'Connell J, Moore SS, Smith TPL, Sonstegard TS, Van Tassell CP: Development and characterization of a high density SNP genotyping assay for cattle. PLoS One 2009, 4(4):e5350.
  • [22]Sun C, Wu XL, Weigel KA, Rosa GJ, Bauck S, Woodward BW, Schnabel RD, Taylor JF, Gianola D: An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genet Res 2012, 94:133-150.
  • [23]Ma P, Brøndum RF, Zhang Q, Lund MS, Su G: Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle. J Dairy Sci 2013, 96:4666-4677.
  • [24]Sargolzaei M, Schenkel FS, Jansen GB, Schaeffer LR: Extent of linkage disequilibrium in Holstein cattle in North America. J Dairy Sci 2008, 91:2106-2117.
  • [25]VanRaden PM, O’Connell JR, Wiggans GR, Weigel KA: Genomic evaluations with many more genotypes. Genet Sel Evol 2011, 43:10.
  • [26]Nicolazzi EL, Biffani S, Jansen G: Short communication: Imputing genotypes using PedImpute fast algorithm combining pedigree and population information. J Dairy Sci 2013, 96(4):2649-2653.
  • [27]Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 2012, 44(8):955-959.
  • [28]Pausch H, Aigner B, Emmerling R, Edel C, Götz KU, Fries R: Imputation of high-density genotypes in the Fleckvieh cattle population. Genet Sel Evol 2013, 45:3.
  • [29]Druet T, Georges M: A hidden markov model combining linkage and inkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 2010, 184:789-798.
  • [30]VanRaden PM, Null DJ, Sargolzaei M, Wiggans GR, Tooker ME, Cole JB, Sonstegard TS, Connor EE, Winters M, Van Kaam JB, Valentini A, Van Doormaal BJ, Faust MA, Doak GA: Genomic imputation and evaluation using high density Holstein genotypes. J Dairy Sci 2013, 96(1):668-678.
  • [31]Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 2006, 78:629-644.
  • [32]Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR: MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 2010, 34(8):816-834.
  文献评价指标  
  下载次数:65次 浏览次数:19次