期刊论文详细信息
BMC Genetics
Multi-population genomic prediction using a multi-task Bayesian learning model
Flavio Schenkel1  Stephen Miller1  Changxi Li2  Liuhong Chen3 
[1] Department of Animal and Poultry Science, University of Guelph, Guelph, ON N1G 2W1, Canada;Agriculture and Agri-Food Canada, Lacombe Research Centre, 6000 C&E Trail, Lacombe, AB T4L 1W1, Canada;Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada
关键词: Stochastic search variable selection;    Genomic prediction;    Multi-population;    Bayesian model;    Multi-task learning;   
Others  :  866517
DOI  :  10.1186/1471-2156-15-53
 received in 2013-08-07, accepted in 2014-04-28,  发布年份 2014
PDF
【 摘 要 】

Background

Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method.

Results

A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance.

Conclusions

Results in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction.

【 授权许可】

   
2014 Chen et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140727074246645.pdf 293KB PDF download
【 参考文献 】
  • [1]Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157(4):1819-1829.
  • [2]Goddard M: Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 2009, 136(2):245-257.
  • [3]Hayes BJ, Goddard ME: Technical note: prediction of breeding values using marker-derived relationship matrices. J Anim Sci 2008, 86(9):2089-2092.
  • [4]Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: genomic selection in dairy cattle: progress and challenges (vol 92, pg 433, 2009). J Dairy Sci 2009, 92(3):1313-1313.
  • [5]VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS: Invited review: Reliability of genomic predictions for north american holstein bulls. J Dairy Sci 2009, 92(1):16-24.
  • [6]de Roos APW, Hayes BJ, Goddard ME: Reliability of genomic predictions across multiple populations. Genetics 2009, 183(4):1545-1553.
  • [7]Hayes B, Bowman P, Chamberlain A, Verbyla K, Goddard M: Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet Sel Evol 2009, 41(1):51.
  • [8]Pryce JE, Gredler B, Bolormaa S, Bowman PJ, Egger-Danner C, Fuerst C, Emmerling R, Solkner J, Goddard ME, Hayes BJ: Short communication: genomic selection using a multi-breed, across-country reference population. J Dairy Sci 2011, 94(5):2625-2630.
  • [9]Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, Mason BA, Goddard ME: Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci 2012, 95(7):4114-4129.
  • [10]Brondum RF, Su GS, Lund MS, Bowman PJ, Goddard ME, Hayes BJ: Genome position specific priors for genomic prediction. BMC Genomics 2012, 13(1):543. BioMed Central Full Text
  • [11]Caruana R: Multitask learning. Mach Learn 1997, 28(1):41-75.
  • [12]Li X, Bilmes J: Regularized adaptation of discriminative classifiers. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 14-19 May 2006; Toulouse237-240. IEEE 2006(vol. 1)
  • [13]Lu Y, Lu F, Sehgal S, Gupta S, Du J, Tham CH, Green P, Wan V: Multitask Learning In Connectionist Speech Recognition. In Proceedings of the Tenth Australian International Conference on Speech Science & Technology: 8-10 December 2004; Sydney. Edited by Cassidy S, Cox F, Mannell R, Palethorpe S. Canberra: Australian Speech Science and Technology Association Inc; 2004:312-315.
  • [14]Yuan X-T, Yan S: Visual classification with multi-task joint sparse representation. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 13-18 June 2010; San Francisco. IEEE; 2010:3493-3500.
  • [15]Jacob L, Vert J-P: Efficient peptide–mhc-i binding prediction for alleles with few known binders. Bioinformatics 2008, 24(3):358-366.
  • [16]Widmer C, Leiva J, Altun Y, Rätsch G: Leveraging sequence classification by taxonomy-based multitask learning. In Research in Computational Molecular Biology. Heidelberg: Springer Berlin; 2010:522-534.
  • [17]Yang W-H, Dai D-Q, Yan H: Finding correlated biclusters from gene expression data. Knowl Data Eng, IEEE T 2011, 23(4):568-584.
  • [18]Puniyani K, Kim S, Xing EP: Multi-population gwa mapping via multi-task regularized regression. Bioinformatics 2010, 26(12):i208-i216.
  • [19]Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF: Accuracy of genomic selection using different methods to define haplotypes. Genetics 2008, 178(1):553-561.
  • [20]Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME: Accuracy of genomic selection using stochastic search variable selection in australian holstein friesian dairy cattle. Genet Res 2009, 91(5):307-311.
  • [21]Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 2011, 12(1):186.
  • [22]Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS: A whole-genome assembly of the domestic cow, bos taurus. Genome Biol 2009, 10(4):R42. BioMed Central Full Text
  • [23]Sargolzaei M, Chesnais JP, Schenkel FS: Fimpute - an efficient imputation algorithm for dairy cattle populations. J Dairy Sci 2011, 94(E-Suppl. 1):421.
  • [24]Gilmour AR, Gogel BJ, Cullis BR, Thompson R: Asreml user guide release 3.0. In Hemel Hempstead, HP1 1ES. UK: VSN International Ltd; 2009.
  • [25]Brito FV, Neto JB, Sargolzaei M, Cobuci JA, Schenkel FS: Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genet 2011, 12(1):80. BioMed Central Full Text
  • [26]Grisart B, Farnir F, Karim L, Cambisano N, Kim JJ, Kvasz A, Mni M, Simon P, Frere JM, Coppieters W, et al.: Genetic and functional confirmation of the causality of the dgat1 k232a quantitative trait nucleotide in affecting milk yield and composition. Proc Natl Acad Sci U S A 2004, 101(8):2398-2403.
  • [27]Pimentel Eda C, Erbe M, Konig S, Simianer H: Genome partitioning of genetic variation for milk production and composition traits in holstein cattle. Front Genet 2011, 2:19.
  • [28]Su G, Christensen OF, Ostersen T, Henryon M, Lund MS: Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One 2012, 7(9):e45293.
  • [29]Toro MA, Varona L: A note on mate allocation for dominance handling in genomic selection. Genet Sel Evol 2010, 42:33. BioMed Central Full Text
  • [30]Vitezica ZG, Varona L, Legarra A: On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics 2013, 195(4):1223-1230.
  • [31]Legarra A, Misztal I: Technical note: computing strategies in genome-wide selection. J Dairy Sci 2008, 91(1):360-366.
  • [32]Calus MP: Right-hand-side updating for fast computing of genomic breeding values. Genet Sel Evol 2014, 46(1):24.
  • [33]Hayashi T, Iwata H: Em algorithm for bayesian estimation of genomic breeding values. BMC Genet 2010, 11:3.
  • [34]Meuwissen THE, Solberg TR, Shepherd R, Woolliams JA: A fast algorithm for bayesb type of prediction of genome-wide estimates of genetic value. Genet Sel Evol 2009, 41:1. BioMed Central Full Text
  • [35]Shepherd RK, Meuwissen TH, Woolliams JA: Genomic selection and complex trait prediction using a fast em algorithm applied to genome-wide markers. BMC Bioinformatics 2010, 11(1):529. BioMed Central Full Text
  • [36]Li ZT, Sillanpaa MJ: Estimation of quantitative trait locus effects with epistasis by variational bayes algorithms. Genetics 2012, 190(1):231-249.
  文献评价指标  
  下载次数:1次 浏览次数:9次