BioData Mining | |
A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection | |
Jestinah M Mahachie John1  François Van Lishout1  Elena S Gusareva1  Kristel Van Steen1  | |
[1] Bioinformatics and Modeling, GIGA-R, University of Liege, Liège, Belgium | |
关键词: Data transformation; Model violations; Epistasis; Model-based multifactor dimensionality reduction; | |
Others : 797193 DOI : 10.1186/1756-0381-6-9 |
|
received in 2012-07-11, accepted in 2013-04-20, 发布年份 2013 | |
![]() |
【 摘 要 】
Background
Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects.
Methodology
Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling.
Results
Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations.
Conclusions
When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
【 授权许可】
2013 Mahachie John et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140706042538913.pdf | 1210KB | ![]() |
|
Figure 5. | 84KB | Image | ![]() |
Figure 4. | 82KB | Image | ![]() |
Figure 3. | 81KB | Image | ![]() |
Figure 2. | 85KB | Image | ![]() |
Figure 1. | 35KB | Image | ![]() |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Van Steen K: Travelling the world of gene–gene interactions. Brief Bioinform 2012, 13:1-19.
- [2]Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001, 69:138-147.
- [3]Mahachie John JM: Genomic Association Screening Methodology for High-Dimensional and Complex Data Structures: Detecting n-Order Interactions. Belgium: Department of Electrical Engineering and Computer Science: University of Liege; 2012.
- [4]Calle ML, Urrea V, vellalta G, Malats N, Van Steen K: Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data. Department of Systems Biology, UoV; 2008. http://www.recercat.net/handle/2072/5001 webcite. Accessed [20 March 2012]
- [5]Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K: Model-Based Multifactor Dimensionality Reduction for detecting epistasis in case–control data in the presence of noise. Ann Hum Genet 2011, 75:78-89.
- [6]Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K: Lower-Order Effects Adjustment in Quantitative Traits Model-Based Multifactor Dimensionality Reduction. PLoS One 2012, 7:e29594.
- [7]Kutner MH, Neter J, Nachtsheim CJ, Li W: Applied Linear Statistical Models: (mainly chapter 18). McGraw-Hill College; 2004.
- [8]McDonald JH: Handbook of Biological Statistics. 2nd edition. Baltimore, Maryland: Sparky House Publishing; 2009.
- [9]Freedman D: Statistical. Models: Theory and Practice. Cambridge University Press; 2000.
- [10]Pearson ES: Note on tests for normality. Biometrika JSTOR 2332104 1931, 22:423.
- [11]Bartlett MS: The effect of non-normality on the t distribution. Proc Camb Philos Soc 1935, 31:223-231.
- [12]Mann HB, Whitney DR: On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann Math Stat 1947, 18:50-60.
- [13]Pratt J: Robustness of Some Procedures for the Two-Sample Location Problem. J Am Stat Assoc 1964, 59:665-680.
- [14]Keselman HJ, Rogan JC, Feir-Walsh BJ: An evaluation of some non-parametric and parametric tests for location equality. Br J Math Stat Psychol 1977, 30:213-221.
- [15]Tomarken A, Serlin R: Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychol Bull 1986, 99:90-99.
- [16]Wolfe R, Carlin JB: Sample-Size Calculation for a Log-Transformed Outcome Measure. Control Clin Trials 1999, 20:547-554.
- [17]Jin H, Zhao X: Transformation and Sample Size. Sweden: Department of Economics and Society: Dalarna University; 2009.
- [18]Conover W: Practical nonparametric statistics . 2nd edition. New York: John Wiley and Sons; 1980.
- [19]Conover WJ, Iman RL: Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics. Am Stat 1981, 35:124-129.
- [20]Gibbons J, Chakraborti S: Comparisons of the Mann-Whitney, Student’s t and alternative t tests for means of normal distributions. J Exp Educ 1991, 59:158-167.
- [21]Zimmerman D, Zumbo B: Rank Transformations and the Power of the Student Test and Welch t’ Test for Non-Normal Populations With Unequal Variances. Can J Exp Psychol 1993, 47:523.
- [22]Danh VN: On estimating the proportion of true null hypotheses for false discovery rate controlling procedures in exploratory DNA microarray studies. Computational Statistics &: Data Analysis 2004, 47:611-637.
- [23]Szymczak SIB-W, Ziegler A: Detecting SNP-expression associations: A comparison of mutual information and median test with standard statistical approaches. Stat Med 2009, 28:3581-3596.
- [24]Rupar K: Significance of Forecast Precision: The Importance of Ex-Ante Expectations. Available at SSRN:http://ssrn.com/abstract=1752217 webciteorhttp://dx.doi.org/102139/ssrn1752217 webcite 2011
- [25]Pett M: Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. SAGE Publications, Inc; 1997.
- [26]Weber M, Sawilowsky S: Comparative Power Of The Independent t, Permutation t, and WilcoxonTests. Journal of Modern Applied Statistical Methods 2009, 8:10-15.
- [27]Yang K, Li J, Gao H: The impact of sample imbalance on identifying differentially expressed genes. BMC Bioinforma 2006, 7((Suppl 4):S8.
- [28]Jeanmougin MDRA, Marisa L, Paccard C, Nuel G, Guedj M: Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies. PLoS One 2010, 5:e12336.
- [29]Mahachie John JM, Van Lishout F, Van Steen K: Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011, 19:696-703.
- [30]Evans DM, Marchini J, Morris AP, Cardon LR: Two-Stage Two-Locus Models in Genome-Wide Association. PLoS Genet 2006, 2:e157.
- [31]Development Core Team R: R. A language and environment for statistical computing. R foundation for Statistical Computing. Retrieved from http://www.r-project.org webcite. Vienna, Austria 2012
- [32]Westfall PH, Young SS: Resampling-based multiple testing. New York: Wiley; 1993.
- [33]Cattaert T, Urrea V, Naj AC, De Lobel L, De Wit V, Fu M, Mahachie John JM, Shen H, Calle ML, Ritchie MD: FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals. PLoS One 2010, 5:e10304.
- [34]Bradley JV: Robustness? Br J Math Stat Psychol 1978, 31:144-152.
- [35]Lou XY, Chen GB, Yan L, Ma JZ, Zhu J, Elston RC, Li MD: A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet 2007, 80:1125-1137.
- [36]Sawilowsky SS: Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. Journal of Modern Applied Statistical Methods 2002, 1:461-472.
- [37]Freedman D: Theory and Practice. New York: Cambridge University Press; 2000.
- [38]Howell DC: Statistical Methods for Psychology . 8th edition. Belmont, CA: Thomson/Wadsworth; 2012.
- [39]Zimmerman DW, Zumbo BD: Can Percentiles Replace Raw Scores in the Statistical Analysis of Test Data? Educ Psychol Meas 2005, 65:616-638.
- [40]Goh L, Yap VB: Effects of normalization on quantitative traits in association test. BMC Bioinforma 2009. 10.
- [41]Mani R, St Onge R, Hartman J, Giaever G, Roth F: Defining genetic interaction. Proc Natl Acad Sci 2008, 105:3461-3466.
- [42]Mahachie John JM, Cattaert T, De Lobel L, Van Lishout F, Empain A, Van Steen K: Comparison of genetic association strategies in the presence of rare alleles. BMC Proc 2011, 5(Suppl 9):S32-S32. BioMed Central Full Text
- [43]Dudoit S, van der Laan MJ: Multiple Testing Procedures with Applications to Genomics. Springer Series in Statistics; 2008.
- [44]Wang X, Elston RC, Zhu X: Statistical interaction in human genetics: how should we model it if we are looking for biological interaction? Nat Rev Genet 2011, 12:74-74.