期刊论文详细信息
BMC Genetics
Genome wide association studies in presence of misclassified binary responses
Romdhane Rekaya3  Nourhene Farhat2  El Hamidi Hay1  Shannon Smith1 
[1] Department of Animal and Dairy Science, The University of Georgia, Athens, GA, USA;PCOM, Suwanee, Athens, GA, USA;Institute of Bioinformatics, The University of Georgia, Athens, GA, USA
关键词: Discrete responses;    Genome wide association;    Misclassification;   
Others  :  1086087
DOI  :  10.1186/1471-2156-14-124
 received in 2013-05-06, accepted in 2013-12-17,  发布年份 2013
PDF
【 摘 要 】

Background

Misclassification has been shown to have a high prevalence in binary responses in both livestock and human populations. Leaving these errors uncorrected before analyses will have a negative impact on the overall goal of genome-wide association studies (GWAS) including reducing predictive power. A liability threshold model that contemplates misclassification was developed to assess the effects of mis-diagnostic errors on GWAS. Four simulated scenarios of case–control datasets were generated. Each dataset consisted of 2000 individuals and was analyzed with varying odds ratios of the influential SNPs and misclassification rates of 5% and 10%.

Results

Analyses of binary responses subject to misclassification resulted in underestimation of influential SNPs and failed to estimate the true magnitude and direction of the effects. Once the misclassification algorithm was applied there was a 12% to 29% increase in accuracy, and a substantial reduction in bias. The proposed method was able to capture the majority of the most significant SNPs that were not identified in the analysis of the misclassified data. In fact, in one of the simulation scenarios, 33% of the influential SNPs were not identified using the misclassified data, compared with the analysis using the data without misclassification. However, using the proposed method, only 13% were not identified. Furthermore, the proposed method was able to identify with high probability a large portion of the truly misclassified observations.

Conclusions

The proposed model provides a statistical tool to correct or at least attenuate the negative effects of misclassified binary responses in GWAS. Across different levels of misclassification probability as well as odds ratios of significant SNPs, the model proved to be robust. In fact, SNP effects, and misclassification probability were accurately estimated and the truly misclassified observations were identified with high probabilities compared to non-misclassified responses. This study was limited to situations where the misclassification probability was assumed to be the same in cases and controls which is not always the case based on real human disease data. Thus, it is of interest to evaluate the performance of the proposed model in that situation which is the current focus of our research.

【 授权许可】

   
2013 Smith et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113183115632.pdf 855KB PDF download
Figure 4. 63KB Image download
Figure 3. 60KB Image download
Figure 2. 54KB Image download
Figure 1. 55KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Fabris C, Smirne C, Toniutto P, Colletta C, Rapetti R, Minisini R, Falleti E, Leutner M, Pirisi M: Usefulness of six non-proprietary indirect markers of liver fibrosis in patients with chronic hepatitis C. Clin Chem 2008, 46(2):253-259.
  • [2]Barendse W: The effect of measurement error of phenotypes on genome wide association studies. BMC Genomics 2011, 12:232-243. BioMed Central Full Text
  • [3]Theodore RS, Basco MR, Biggan JR: Diagnostic disagreements in bipolar disorder: the role of substance abuse comorbidities. Depression Research and Treatment 2012, 2012:6. Article ID 435486, doi:10.1155/2012/435486
  • [4]Meyer F, Meyer TD: The misdiagnosis of bipolar disorder as a psychotic disorder: some of its causes and their influence on therapy. J Affect Disord 2009, 112:105-115.
  • [5]Garland LH: Studies on the accuracy of diagnostic procedures. Am J Roentgenol 1959, 82:25-38.
  • [6]Berlin L: Accuracy of diagnostic procedures: has it improved over the past five decades? Am J Roentgenol 2007, 188:1173-1178.
  • [7]Wray N, Lee SH, Kendler KS: Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes. Eur J Hum Genet 2012, 20:668-674.
  • [8]Bromet EJ, Kotov R, Fochtmann LJ, Carlson GA, Tanenberg-Karant M, Ruggero C, Chang SW: Diagnostic shifts during the decade following first admission for psychosis. Am J Psychiat 2011, 168:1186-1194.
  • [9]West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci U S A 2001, 98:11462-11467.
  • [10]Robbins K, Joseph S, Zhang W, Rekaya R, Bertrand JK: Classification of incipient Alzheimer patients using gene expression data: dealing with potential misdiagnosis. Online J. Bioniformatics 2006, 7:22-31.
  • [11]Anderson RE, Hill RB, Key CR: The sensitivity and specificity of clinical diagnostics during five decades: toward an understanding of necessary fallibility. JAMA 1989, 261:1610-1617.
  • [12]Berner ES, Graber ML: Overconfidence as a cause of diagnostic error in medicine. Am J Med 2008, 121:S2-S23.
  • [13]Renfrew DL, Franken EA, Berbaum KS, Weigelt FH, Abu-Yousef MM: Error in radiology: classification and lessons in 182 cases presented at a problem case conference. Radiology 1992, 183:145-150.
  • [14]Shively CM: Quality in management radiology. Imaging Economics 2003, 11:6.
  • [15]Landro L: Hospitals move to cut dangerous lab errors. Wall Street Journalin press
  • [16]Plebani M: Errors in clinical laboratories or errors in laboratory medicine? Clin Chem Lab Med 2006, 44:750-759.
  • [17]Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005, 6:98-108.
  • [18]Manolio TA, Brooks LD, Collins FS: A HapMap harvest of insights into the genetics of common disease. J Clin Invest 2008, 118:1590-1605.
  • [19]McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008, 9:356-369.
  • [20]Thomas A: GMCheck: Bayesian error checking for pedigree genotypes and phenotypes. Bioinformatics 2005, 21:3187-3188.
  • [21]Kennedy J, Mandoiu I, Pasaniuc B: Genotype error detection using hidden markov models of haplotype diversity. J Comp Bio 2008, 15:1155-1171.
  • [22]Avery CL, Monda KL, North KE: Genetic association studies and the effect of misclassification and selection bias in putative confounders. BMC Proc 2009, 3:S48. BioMed Central Full Text
  • [23]Wilcox MA, Paterson AD: Phenotype definition and development—contributions from Group 7. Genet Epidemiol 2009, 33(Suppl 1):S40-S44.
  • [24]Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, Guan J, Fan D, Weng Q, Huang T, Dong G, Sang T, Han B: High-throughput genotyping by whole genome resequencing. Genome Res 2009, 19:1068-1076.
  • [25]Hossain S, Le ND, Brooks-Wilson AR, Spinelli JJ: Impact of genotype misclassification on genetic association estimates and the Bayesian adjustment. Am J Epidemiol 2009, 170:994-1004.
  • [26]Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K: A comprehensive review of genetic association studies. Genet Med 2002, 2:45-61.
  • [27]Skafidas E, Testa R, Zantomio D, Chana G, Everall IP, Pantelis C: Predicting the diagnosis of autism spectrum disorder using gene pathway analysis. Mol Psychiatry 2012. doi:10.1038/mp.2012.126
  • [28]Li A, Meyre D: Challenges in reproducibility of genetic association studies: lessons learned from the obesity field. Int J Obes (Lond) 2012. doi:10.1038/ijo.2012.82
  • [29]Galvan A, Ioannidis JPA, Dragani TA: Beyond genome-wide association studies: genetic heterogeneity and individual predisposition to cancer. Trends Genet 2010, 26:132-141.
  • [30]Wu C, DeWan A, Hoh J, Wang Z: A comparison of association methods correcting for population stratification in case–control studies. Annals of human genetics 2011, 418-427. doi:10.1111/j.1469-1809.2010.00639
  • [31]Zhang W, Rekaya R, Bertrand JK: A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics 2006, 22:317-325.
  • [32]Paulino CD, Soares P, Neuhaus J: Binomial regression with misclassification. Biometrics 2003, 59:670-675.
  • [33]Paulino CD, Silva G, Achcar JA: Bayesian analysis of correlated misclassified binary data. Comp Statist Data Anal 2005, 49:1120-1131.
  • [34]Rekaya R, Weigel KA, Gianola D: Threshold model for misclassified binary responses with applications to animal breeding. Biometrics 2001, 57:1123-1129.
  • [35]Cook RJ, Ng ETM, MEADE, MO: Estimation of operating characteristics for dependent diagnostic tests based on latent Markov models. Biometrics 2000, 56:1109-1117.
  • [36]Chen Z, Yi GY, Wu C: Marginal methods for correlated binary data with misclassified responses. Biometrika 2011, 98:647-662.
  • [37]Rosychuck RJ, Thompson ME: A semi-Markov model for binary longitudinal responses subject to misclassification. Can J Statist 2001, 29:395-404.
  • [38]Rosychuck RJ, Thompson ME: Bias correction of two-state latent Markov process parameter estimates under misclassification. Statist Med 2003, 22:2035-2055.
  • [39]Sorensen DA, Andersen S, Gianola D, Korsgaard I: Bayesian inference in threshold using Gibbs sampling. Genet Sel Evol 1995, 27:229-249. BioMed Central Full Text
  • [40]Sapp RL, Spangler ML, Rekaya R, Bertrand JK: a simulation study for analysis of uncertain binary responses: application to first insemination success in beef cattle. Genet Sel Evol 2005, 37:615-634. BioMed Central Full Text
  • [41]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007, 81:559-575.
  • [42]Hardy J, Singleton A: Genome wide association studies and human disease. N Engl J Med 2009, 360:1759-1768.
  • [43]Wray NR, Goddard ME: Multi-locus models of genetic risk of disease. Genome Med 2010, 2:10. BioMed Central Full Text
  • [44]Cambien F: Heritability, weak effects, and rare variants in genome wide association studies. Clin Chem 2011, 57:1263-1266.
  • [45]Spencer C, Hechter E, Vukcevic D, Donnelly P: Quantifying the underestimation of relative risks from genome-wide association studies. PLoS Genet 2011, 7:e1001337.
  • [46]Stringer S, Wray NR, Kahn RS, Derks EM: Underestimated effect sizes in GWAS: fundamental limitations of single SNP analysis for dichotomous phenotypes. PLoS ONE 2011, 6:e27964.
  • [47]Feng JY, Zhang J, Zhang WJ, Wang SB, Han SF, Zhang YM: An efficient hierarchical generalized linear mixed model for mapping QTL of ordinal traits in crop cultivars. PLoS ONE 2013, 8:e59541.
  • [48]Yi N, Liu N, Zhi D, Li J: Hierarchical generalized model for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 2011, 7:e1002382.
  • [49]Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC Genet 2013, 14:5. BioMed Central Full Text
  文献评价指标  
  下载次数:32次 浏览次数:15次