BMC Genetics | |
A comparison of statistical methods for genomic selection in a mice population | |
Sandra A Queiroz2  Roberto Carvalheiro1  Haroldo HR Neves2  | |
[1] GenSys Consultores Associados S/S Ltda., Porto Alegre, Rio Grande do Sul, Brazil;Departamento de Zootecnia, FCAV, UNESP, Jaboticabal, CEP: 14.884-900, SP, Brazil | |
关键词: Subset selection; SNP; Ridge regression; Random forest; LASSO; Kernel regression; | |
Others : 1089673 DOI : 10.1186/1471-2156-13-100 |
|
received in 2012-08-14, accepted in 2012-10-31, 发布年份 2012 | |
【 摘 要 】
Background
The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population.
Results
For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods ( LASSO and Random Forest, RF) had the greatest predictive abilities for %CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions.
Conclusions
Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of %CD8+ and CD4+/CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection.
【 授权许可】
2012 Neves et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150127011044315.pdf | 1376KB | download | |
Figure 4. | 39KB | Image | download |
Figure 3. | 75KB | Image | download |
Figure 2. | 81KB | Image | download |
Figure 1. | 90KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157:1819-1829.
- [2]Jannink J-L, Lorenz AJ, Iwata H: Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 2010, 9:166-177.
- [3]Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA: The impact of genetic architecture on genome-wide evaluation methods. Genetics 2010, 185:1021-1031.
- [4]Clark SA, Hickey JM, van der Werf JHJ: Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 2011, 43:18. BioMed Central Full Text
- [5]Heslot N, Yang H-P, Sorrells ME, Jannink J-L: Genomic selection in plant breeding: a comparison of models. Crop Sci 2012, 52:146-160.
- [6]Legarra A, Robert-Granie C, Manfredi E, Elsen J-M: Performance of genomic selection in mice. Genetics 2008, 180:611-618.
- [7]Lee SH, Van Der Werf JHJ, Hayes BJ, Goddard ME, Visscher PM: Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genet 2008, 4:e100023.
- [8]Usai MG, Goddard ME, Hayes BJ: LASSO with cross-validation for genomic selection. Genet Res 2009, 91:427-436.
- [9]Laurie CC, Nickerson DA, Anderson AD, Weir BS, Livingston RJ, Dean MD, Smith KL, Schadt EE, Nachman MW: Linkage disequilibrium in wild mice. PLoS Genet 2007, 3:e144.
- [10]Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J: Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet 2006, 38:879-887.
- [11]Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JN, Mott R, Flint J: Genetic and environmental effects on complex traits in mice. Genetics 2006, 174:959-984.
- [12]Moser G, Khatkar MS, Raadsma HW: Imputation of missing genotypes in high density SNP data. In In Proceedings of 18th Conference Of The Association For The Advancement Of Animal Breeding And Genetics: 28 September - 1 October 2009. Barrosa Valley; 2009:612-615.
- [13]VanRaden P: Efficient methods to compute genomic predictions. J Dairy Sci 2008, 91:4414-4423.
- [14]Hastie T, Tibshirani R: Efficient quadratic regularization for expression arrays. Biostatistics 2004, 5:329-340.
- [15]Shepherd RK, Meuwissen THE, Wooliams JA: Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinformatics 2010, 11:529. BioMed Central Full Text
- [16]Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Statist 2001, 29:1165-1188.
- [17]Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J: Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 2010, 186:713-724.
- [18]Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW: A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet Sel Evol 2009, 41:56. BioMed Central Full Text
- [19]Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 2011, 12:186. BioMed Central Full Text
- [20]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996, 58:267-288.
- [21]Breiman L: Random forests. Machine Learning 2001, 45:5-32.
- [22]R Development Core Team: R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2012. http://www.r-project.org webcite
- [23]Legarra A, Misztal I: Technical note: computing strategies in genome-wide selection. J Dairy Sci 2008, 91:360-366.
- [24]Butler DG, Cullis BR, Gilmour AR, Gogel BJ: ASREML-R Reference Manual. Release 3.0. Hemel Hempstead: VSN International Ltd; 2009. OpenURL20
- [25]Ober U, Ayroles JF, Stone EA, Richards S, Zhu D, Gibbs RA, Stricker C, Gianola D, Schlather M, Mackay TFC, Simianer H: Using whole-genome sequence data to predict quantitative trait phenotypes in drosophila melanogaster. PLOS Genet 2012, 8:e1002685.
- [26]Clark SA, Hickey JM, Daetwyler HD, van der Werf JHJ: The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Gen Sel Evol 2012, 44:4. BioMed Central Full Text
- [27]Habier D, Fernando RL, Dekkers JCM: The impact of genetic relationship information on genome-assisted breeding values. Genetics 2007, 177:2389-2397.
- [28]Hayes BJ, Bowman PJ, Chamberlain AC, Klara Verbyla K, Goddard ME: Accuracy of genomic breeding values in multi-breed dairy cattle populations. Gen Sel Evol 2009, 41:51. BioMed Central Full Text
- [29]Calus MPL: Genomic breeding value prediction: methods and procedures. Animal 2010, 4:157-164.
- [30]Resende MF Jr, Muñoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M: Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 2012, 190:1503-1510.
- [31]Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME: Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet Res (Camb) 2009, 91:307-311.
- [32]Hayes BJ: Genomic Selection in the era of the $1000 genome sequence. In Symposium Statistical Genetics of Livestock for the Post-Genomic Era. Madison; 2009. http://dysci.wisc.edu/sglpge/pdf/Hayes.pdf webcite
- [33]Vitezica ZG, Aguilar I, Misztal I, Legarra A: Bias in genomic predictions for populations under selection. Genet Res (Camb) 2011, 93:357-366.
- [34]Wolc A, Stricker C, Arango J, Settar P, Fulton JE, O'Sullivan NP, Preisinger R, Habier D, Fernando R, Garrick DJ, Lamont SJ, Dekkers JCM: Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Gen Sel Evol 2011, 43:5. BioMed Central Full Text
- [35]Wiggans GR, Sonstegard TS, Van Raden PM, Matukumalli LK, Schnabel RD, Taylor JF, Chesnais JP, Schenkel FS, Van Tassell CP: Genomic Evaluations in the United States and Canada: A Collaboration. In International Committee on Animal Recording (ICAR),June 22–26, 2009. Munich: ICAR Tech Ser; 2009:347-353.
- [36]Meuwissen THE, Goddard ME: Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 2010, 185:623-631.
- [37]Goddard ME, Hayes BJ, Meuwissen THE: Genomic selection in livestock populations. Genet Res (Camb) 2010, 92:413-421.