BioData Mining | |
gammaMAXT: a fast multiple-testing correction algorithm | |
François Van Lishout2  Francesco Gadaleta2  Jason H. Moore1  Louis Wehenkel2  Kristel Van Steen2  | |
[1] Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia 19104-6021, PA, USA | |
[2] Bioinformatics and Modeling, GIGA-R, Avenue de l’Hôpital 1, Sart-Tilman 4000, Belgium | |
关键词: Algorithmic; 3-order interactions; SNP-environment interactions; Gamma distribution; MaxT; Genome-wide interaction studies; Multiple testing; | |
Others : 1234052 DOI : 10.1186/s13040-015-0069-x |
|
received in 2015-06-07, accepted in 2015-11-08, 发布年份 2015 | |
【 摘 要 】
Background
The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout’s implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden.
Results
In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10 6SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10 4times longer with MBMDR-3.0.3.
Conclusions
These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.
【 授权许可】
2015 Van Lishout et al.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151126011032721.pdf | 933KB | download | |
Fig. 3. | 85KB | Image | download |
Fig. 2. | 36KB | Image | download |
Fig. 1. | 59KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
【 参考文献 】
- [1]Shastry BS: Pharmacogenetics and the concept of individualized medicine. Pharmacogenomics J 2006, 6(1):16-21.
- [2]van’t Veer LJ, Bernards R: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature 2008, 452(7187):564-70.
- [3]Galas DJ, Hood L: Systems biology and emerging technologies will catalyze the transition from reactive medicine to predictive, personalized, preventive and participatory (p4) medicine. Interdisc Bio Central 2009, 1:1-4.
- [4]Beevers CG: McGeary JE: Therapygenetics: moving towards personalized psychotherapy treatment. Trends Cogn Sci 2012, 16(1):11-12.
- [5]Lester KJ, Eley TC: Therapygenetics: Using genetic markers to predict response to psychological treatment for mood and anxiety disorders. Biology of mood and anxiety disorders 2013, 3(1):1-16. BioMed Central Full Text
- [6]Slatkin M: Epigenetic inheritance and the missing heritability problem. Genetics 2009, 182(3):845-50.
- [7]Eichler EE, Flint J, Gibson G, Kong A, Lean S, Moore JH, et al.: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010, 11(6):446-50.
- [8]Lee SH, Wray NR, Goddard ME, Visscher PM: Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 2011, 88(3):294.
- [9]Zuk O, Hechter E, Sunyaev SR, Lander ES: The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci 2012, 109(4):1193-98.
- [10]Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, et al.: Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet 2010, 87:325-40.
- [11]Gyenesei A, Moody J, Semple CA, Haley CS, Wei WH: High-throughput analysis of epistasis in genome-wide association studies with biforce. Bioinformatics 2012, 19:376-82.
- [12]Hemani G, Theocharidis A, Wei W, Haley C: epigpu: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics 2011, 27:1462-1465.
- [13]Kam-Thong T, Czamara D, Tsuda K, Borgwardt K, Lewis C, Erhardt-Lehmann A, et al.: epiblaster-fast exhaustive two-locus epistasis detection strategy using graphical pro- cessing units. Eur J Hum Genet 2011, 19:465-71.
- [14]Kam-Thong T, Azencott C, Cayton L, Putz B, Altmann A, Karbalai N, et al.: Glide: Gpu-based linear regression for detection of epistasis. Hum Hered 2012, 73:220-36.
- [15]Ritchie MD, Hahn LW, Moore JH: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemil 2003, 24(2):150-7.
- [16]Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 2002, 19(3):376-82.
- [17]Calle ML, Urrea V, Vellalta G, Malats N, Van Steen K: Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Stat Med 2008, 27:6532-546.
- [18]Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, et al.: Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet 2011, 75:78-89.
- [19]Gusareva E, Van Steen K: Practical aspects of genome-wide association interaction analysis. Hum Genet 2014, 133(11):1343-58.
- [20]Wienbrandt L, Kässens JC, Gonzalez-Dominguez J, Schmidt B, Ellinghaus D, Schimmler M. FPGA-based Acceleration of Detecting Statistical Epistasis in GWAS In: Science PC, editor. 14th International Conference on Computational Science. Elsevier - Procedia Computer Science, vol 29;2014. p. 220–30.. http://www.sciencedirect.com/science/article/pii/S1877050914001975 webcite
- [21]Van Steen K: Traveling the world of gene-gene interactions. Brief Bioinform 2011, 13(1):1-19.
- [22]Mahachie John JM, Cattaert T, Van Lishout F, Gusareva E, Van Steen K: Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS ONE 2012, 7(1):29594-1013710029594.
- [23]Van Lishout F, Mahachie John JM, Gusareva ES, Urrea V, Cleynen I, Théâtre E, et al. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinforma. 2013;14(138).. http://www.biomedcentral.com/1471-2105/14/138 webcite
- [24]Dunn OJ: Multiple comparisons among means. J Am Stat Assoc 1961, 56(293):52-64.
- [25]Ge Y, Dudoit S, Speed TP: Resampling-based multiple testing for microarray data analysis. Technical Report 633. Department of Statistics, University of California, Berkley; 2003.
- [26]Westfall PH, Young SS: Resampling-base Multiple Testing. Wiley, New York; 1993.
- [27]Mahachie John JM, Van Lishout F, Van Steen K: Model-based multifactor dimensionality reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011, 19(6):696-703.
- [28]Calle ML, Urrea V, Malats N, Van Steen K. Mb-mdr: model-based multifactor dimensionality reduction for detecting interactions in high-dimensional genomic data. Technical Report 24. 2008.
- [29]Mahachie John JM. Genomic association screening methodology for high-dimensional and complex data structures: Detecting n-order interactions. 2012.. http://orbi.ulg.ac.be/handle/2268/136086 webcite
- [30]Kotz S, Balakrishnan N, Johnson N. Continuous Multivariate Distributions, Models and Applications: Wiley; 2000.
- [31]Hautsch N, Malec P, Schienle M: Capturing the zero: A new class of zero- augmented distributions and multiplicative error processes. J Financ Econ 2013, 12(1):89.
- [32]Bickel P, Doksum K. Mathematical Statistics, Basic Ideas and Selected Topics: Prentice-Hall, Inc; 1977.
- [33]Allenby GM, Leone RP, Jen LC: A dynamic model of purchase timing with application to direct marketing. J Am Stat Assoc 1999, 94:365-74.
- [34]Pattin KA, White BC, Barney N, Gui J, Nelson HH, Kelsey KT, et al.: A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet Epidemiol 2009, 33(1):87-94.
- [35]Minka TP. Estimating a gamma distribution. 2002.. http://research.microsoft.com/en-us/um/people/minka/papers/minka-gamma.pdf webcite
- [36]Choi SC, Wette R: Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics 1969, 11(4):683-90.
- [37]Libioulle C, Louis E, Hansoul S, Sandor C, Farnir F, Franchimont D, et al.: Novel crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of ptger4. Plos Genetics 2007, 3(4):58.
- [38]Barett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, et al.: Genome-wide association defines more than 30 distinct susceptibility loci for crohn’s disease. Nat Genet 2008, 40(8):955-62.
- [39]Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, Heberling T, Fisher JM: Moore JH: Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 2012, 5(1):16.
- [40]Bradley J: Robustness? Br J Math Stat Psychol 1978, 31:144-52.