期刊论文详细信息
BMC Genomics
Assessing models for genetic prediction of complex traits: a comparison of visualization and quantitative methods
Jo Knight3  Michael E. Weale1  Andrew D. Paterson2  Sarah A. Gagliano4 
[1] Department of Medical & Molecular Genetics, King’s College London, Guy’s Hospital, London, UK;Epidemiology Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada;Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada;Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
关键词: Receiver operating characteristic curve;    Genetic prediction;    Predictive accuracy;   
Others  :  1204543
DOI  :  10.1186/s12864-015-1616-z
 received in 2014-12-23, accepted in 2015-05-05,  发布年份 2015
PDF
【 摘 要 】

Background

In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models.

Methods

We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not.

Results

We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores.

【 授权许可】

   
2015 Gagliano et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150525020157121.pdf 806KB PDF download
Fig. 8. 47KB Image download
Fig. 7. 36KB Image download
Fig. 6. 40KB Image download
Fig. 5. 33KB Image download
Fig. 4. 35KB Image download
Fig. 3. 35KB Image download
Fig. 2. 28KB Image download
Fig. 1. 13KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

【 参考文献 】
  • [1]Gagliano SA, Barnes MR, Weale ME, Knight J. A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization. PLoS ONE. 2014; 9:e98122.
  • [2]Iversen ES, Lipton G, Clyde MA, Monteiro AN. Functional annotation signatures of disease susceptibility loci improve SNP association analysis. BMC Genomics. 2014; 15:398. BioMed Central Full Text
  • [3]Kindt AS, Navarro P, Semple CA, Haley CS. The genomic signature of trait-associated variants. BMC Genomics. 2013; 14:108. BioMed Central Full Text
  • [4]Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014; 46:310-5.
  • [5]Pickrell JK. Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits. Am J Hum Genet. 2014; 94:559-73.
  • [6]Ritchie GRS, Dunham I, Zeggini E, Flicek P. Functional annotation of noncoding sequence variants. Nat Methods. 2014; 11:294-6.
  • [7]Xu M, Bi Y, Xu Y, Yu B, Huang Y, Gu L et al.. Combined effects of 19 common variations on type 2 diabetes in Chinese: results from two community-based studies. PLoS One. 2010; 5:e14022.
  • [8]Lango H, Palmer CNA, Morris AD, Zeggini E, Hattersley AT, McCarthy MI et al.. Assessing the Combined Impact of 18 Common Genetic Variants of Modest Effect Sizes on Type 2 Diabetes Risk. Diabetes. 2008; 57:3129-35.
  • [9]Janipalli CS, Kumar MVK, Vinay DG, Sandeep MN, Bhaskar S, Kulkarni SR et al.. Analysis of 32 common susceptibility genetic variants and their combined effect in predicting risk of Type 2 diabetes and related traits in Indians. Diabet Med. 2012; 29:121-7.
  • [10]A language and environment for statistical computing. R Found Stat Comput. R Foundation for Statistical Computing, Vienna, Austria; 2008. ISBN 3-900051-07-0
  • [11]Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005; 21:3940-1.
  • [12]Lemon J. Plotrix: a package in the red light district of R. R-News. 2006; 6:8-12.
  • [13]Hothorn T, Hornik K, van de Wiel MA, Zeileis A. A Lego System for Conditional Inference. Am Stat. 2006; 60:257-63.
  • [14]Hindorff LAJH, Hall PM, Mehta JP, Manolio TA. A catalog of published genome-wide association studies. 2010.
  • [15]James G, Witten DM, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, New York, NY; 2013.
  • [16]Malley JD, Malley KG, Pajevic S. Statistical Learning for Biomedical Data. Cambridge University Press, Cambridge; 2011.
  • [17]Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC Curves. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, NY, USA; 2006: p.233-40. [ICML’06]
  • [18]Lee JK. Road to Statistical Bioinformatics. In Statistical Bioinformatics. Edited by Lee JK. Hoboken, N.J: John Wiley & Sons, Inc.; 2010:1–6.
  • [19]Vihinen M. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. BMC Genomics. 2012; 13 Suppl 4:S2. BioMed Central Full Text
  • [20]Gagliano SA, Ravji R, Barnes MR, Weale ME, Knight J. Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants. 2014.
  文献评价指标  
  下载次数:98次 浏览次数:32次