期刊论文详细信息
BMC Genetics
Effect of genotype imputation on genome-enabled prediction of complex traits: an empirical study with mice data
Guilherme JM Rosa3  Martinho A Silva1  Daniel Gianola3  Hayrettin Okut2  Vivian PS Felipe3 
[1] Department of Animal Sciences, Federal University of Jequitinhonha and Mucuri Valleys, Minas Gerais, Brazil;Department of Animal Sciences, Biometry and Genetics Branch, University of Yuzuncu Yil, Van 65080, Turkey;Department of Animal Sciences, University of Wisconsin, Madison 53706, USA
关键词: Non-linear models;    Complex traits;    Genome-enabled prediction;    Genotype imputation;   
Others  :  1129190
DOI  :  10.1186/s12863-014-0149-9
 received in 2014-09-04, accepted in 2014-12-10,  发布年份 2014
PDF
【 摘 要 】

Background

Genotype imputation is an important tool for whole-genome prediction as it allows cost reduction of individual genotyping. However, benefits of genotype imputation have been evaluated mostly for linear additive genetic models. In this study we investigated the impact of employing imputed genotypes when using more elaborated models of phenotype prediction. Our hypothesis was that such models would be able to track genetic signals using the observed genotypes only, with no additional information to be gained from imputed genotypes.

Results

For the present study, an outbred mice population containing 1,904 individuals and genotypes for 1,809 pre-selected markers was used. The effect of imputation was evaluated for a linear model (the Bayesian LASSO - BL) and for semi and non-parametric models (Reproducing Kernel Hilbert spaces regressions – RKHS, and Bayesian Regularized Artificial Neural Networks – BRANN, respectively). The RKHS method had the best predictive accuracy. Genotype imputation had a similar impact on the effectiveness of BL and RKHS. BRANN predictions were, apparently, more sensitive to imputation errors. In scenarios where the masking rates were 75% and 50%, the genotype imputation was not beneficial. However, genotype imputation incorporated information about important markers and improved predictive ability, especially for body mass index (BMI), when genotype information was sparse (90% masking), and for body weight (BW) when the reference sample for imputation was weakly related to the target population.

Conclusions

In conclusion, genotype imputation is not always helpful for phenotype prediction, and so it should be considered in a case-by-case basis. In summary, factors that can affect the usefulness of genotype imputation for prediction of yet-to-be observed traits are: the imputation accuracy itself, the structure of the population, the genetic architecture of the target trait and also the model used for phenotype prediction.

【 授权许可】

   
2014 Felipe et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150226010257811.pdf 638KB PDF download
Figure 1. 35KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Goddard ME, Hayes BJ: Genomic selection. J Anim Breed Genet 2007, 124(6):323-330.
  • [2]Lee SH, van der Werf JH, Hayes BJ, Goddard ME, Visscher PM: Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genet 2008, 4(10):e1000231.
  • [3]Weigel KA, De Los Campos G, Vazquez AI, Rosa GJM, Gianola D, Van Tassell CP: Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci 2010, 93(11):5423-5435.
  • [4]De Los Campos G, Gianola D, Allison DB: Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 2010, 11(12):880-886.
  • [5]Vazquez AI, De Los Campos G, Klimentidis YC, Rosa GJ, Gianola D, Yi N, Allison DB: A comprehensive genetic approach for improving prediction of skin cancer risk in humans. Genetics 2012, 192(4):1493-1502.
  • [6]Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157(4):1819-1829.
  • [7]Gianola D: Priors in whole-genome regression: the Bayesian alphabet returns. Genetics 2013, 194(3):573-596.
  • [8]Mulder HA, Calus MPL, Druet T, Schrooten C: Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. J Dairy Sci 2012, 95(2):876-889.
  • [9]Jimenez-Montero JA, Gianola D, Weigel K, Alenda R, Gonzalez-Recio O: Assets of imputation to ultra-high density for productive and functional traits. J Dairy Sci 2013, 96(9):6047-6058.
  • [10]Habier D, Fernando RL, Dekkers JC: Genomic selection using low-density marker panels. Genetics 2009, 182(1):343-353.
  • [11]Weigel KA, De Los Campos G, Gonzalez-Recio O, Naya H, Wu XL, Long N, Rosa GJM, Gianola D: Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci 2009, 92(10):5248-5257.
  • [12]Dassonneville R, Brondum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, Lund MS, Ducrocq V, Su G: Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci 2011, 94(7):3679-3686.
  • [13]Moser G, Khatkar MS, Hayes BJ, Raadsma HW: Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol 2010, 42:37. BioMed Central Full Text
  • [14]Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 2009, 84(2):210-223.
  • [15]Calus MP, Veerkamp RF, Mulder HA: Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework. J Anim Sci 2011, 89(7):2042-2049.
  • [16]Sun CY, Wu XL, Weigel KA, Rosa GJM, Bauck S, Woodward BW, Schnabel RD, Taylor JF, Gianola D: An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genet Res 2012, 94(3):133-150.
  • [17]VanRaden PM, O'Connell JR, Wiggans GR, Weigel KA: Genomic evaluations with many more genotypes. Genet Sel Evol 2011, 43:10. BioMed Central Full Text
  • [18]Mackay TF: The genetic architecture of quantitative traits: lessons from Drosophila. Curr Opin Genet Dev 2004, 14(3):253-257.
  • [19]Gianola D, van Kaam JB: Reproducing kernel hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 2008, 178(4):2289-2303.
  • [20]de Los CG, Gianola D, Rosa GJ: Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J Anim Sci 2009, 87(6):1883-1887.
  • [21]De Los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J: Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 2010, 92(4):295-308.
  • [22]Long N, Gianola D, Rosa GJ, Weigel KA, Kranis A, Gonzalez-Recio O: Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet Res 2010, 92(3):209-225.
  • [23]Gonzalez-Camacho JM, de Los CG, Perez P, Gianola D, Cairns JE, Mahuku G, Babu R, Crossa J: Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet 2012, 125(4):759-771.
  • [24]Gianola D, Okut H, Weigel KA, Rosa GJ: Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet 2011, 12:87. BioMed Central Full Text
  • [25]Okut H, Gianola D, Rosa GJ, Weigel KA: Prediction of body mass index in mice using dense molecular markers and a regularized neural network. Genet Res 2011, 93(3):189-201.
  • [26]Heslot N, Yang HP, Sorrells ME, Jannink JL: Genomic selection in plant breeding: a comparison of models. Crop Sci 2012, 52(1):146-160.
  • [27]De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM: Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 2009, 182(1):375-385.
  • [28]Perez-Rodriguez P, Gianola D, Gonzalez-Camacho JM, Crossa J, Manes Y, Dreisigacker S: Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 2012, 2(12):1595-1605.
  • [29]Howard R, Carriquiry AL, Beavis WD: Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3-Genes Genomes Genetics 2014, 4(6):1027-1046.
  • [30]Legarra A, Robert-Granie C, Manfredi E, Elsen JM: Performance of genomic selection in mice. Genetics 2008, 180(1):611-618.
  • [31]Berry DP, Kearney JF: Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal 2011, 5(8):1162-1169.
  • [32]Daetwyler HD, Wiggans GR, Hayes BJ, Woolliams JA, Goddard ME: Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 2011, 189(1):317-327.
  • [33]Vazquez AI, Rosa GJ, Weigel KA, De Los Campos G, Gianola D, Allison DB: Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci 2010, 93(12):5942-5949.
  • [34]Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J: Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet 2006, 38(8):879-887.
  • [35]Mott R: Finding the molecular basis of complex genetic variation in humans and mice. Philos Trans R Soc Lond B Biol Sci 2006, 361(1467):393-401.
  • [36]Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JN, Mott R, Flint J: Genetic and environmental effects on complex traits in mice. Genetics 2006, 174(2):959-984.
  • [37]Usai MG, Goddard ME, Hayes BJ: LASSO with cross-validation for genomic selection. Genet Res 2009, 91(6):427-436.
  • [38]Browning BL, Browning SR: A fast, powerful method for detecting identity by descent. Am J Hum Genet 2011, 88(2):173-182.
  • [39]Tibshirani R: Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B-Methodological 1996, 58(1):267-288.
  • [40]Park T, Casella G: The Bayesian Lasso. J Am Stat Assoc 2008, 103(482):681-686.
  • [41]Rosa GJM, Padovani CR, Gianola D: Robust linear mixed models with normal/independent distributions and Bayesian MCMC implementation. Biom J 2003, 45(5):573-590.
  • [42]Perez P, de Los CG, Crossa J, Gianola D: Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 2010, 3(2):106-116.
  • [43]Aronszajn N: Introduction to the theory of Hilbert spaces. Reasearch sic Foundation, Stillwater, Okla; 1950.
  • [44]Wahba G: Society for Industrial and Applied Mathematics.: Spline models for observational data. In CBMS-NSF Regional Conference series in applied mathematics 59. Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104), Philadelphia, Pa; 1990. 1 electronic text (xii, 169 p.)
  • [45]Gianola D, Fernando RL, Stella A: Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 2006, 173(3):1761-1776.
  • [46]Crossa J, Campos Gde L, Perez P, Gianola D, Burgueno J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ: Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 2010, 186(2):713-724.
  • [47]Bishop CM: Pattern recognition and machine learning. Springer, New York; 2006.
  • [48]Mackay DJC: Bayesian Interpolation. Neural Comput 1992, 4(3):415-447.
  • [49]MacKay DJC: Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge, UK; New York; 2003.
  • [50]Demuth HB, Beale MH: MathWorks Inc: Neural network toolbox for use with MATLAB : user's guide. MathWorks, Natick, Mass; 2001.
  文献评价指标  
  下载次数:15次 浏览次数:44次