| BMC Bioinformatics | |
| Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering | |
| Xuan Guo1  Yu Meng1  Ning Yu1  Yi Pan1  | |
| [1] Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, USA | |
| 关键词: Dynamic clustering; Genome-wide association studies; Cloud computing; | |
| Others : 818677 DOI : 10.1186/1471-2105-15-102 |
|
| received in 2013-10-28, accepted in 2014-03-17, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Backgroud
Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger.
Results
In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions.
Conclusions
Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
【 授权许可】
2014 Guo et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140711132803103.pdf | 709KB | ||
| Figure 6. | 39KB | Image | |
| Figure 5. | 51KB | Image | |
| Figure 4. | 37KB | Image | |
| Figure 3. | 30KB | Image | |
| Figure 2. | 60KB | Image | |
| Figure 1. | 57KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Sabaa H, Cai Z, Wang Y, Goebel R, Moore S, Lin G: Whole genome identity-by-descent determination. J Bioinform Comput Biol 2013, 11(02):1350002.
- [2]He Y, Zhang Z, Peng X, Wu F, Wang J: De novo assembly methods for next generation sequencing data. Tsinghua Sci Technol 2013, 18(5):500-514.
- [3]Peter K, Hunter DJ: Genetic risk prediction: are we there yet? N Engl J Med 2009, 360(17):1701-1703.
- [4]He Q, Lin DY: A variable selection method for genome-wide association studies. Bioinformatics 2011, 27:1-8.
- [5]Marchini J, Donnelly P, Cardon LR: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 2005, 37:413-417.
- [6]Bateson W: Mendel’s Principles of Heredity. Cambridge: Cambridge University Press; 1909.
- [7]Cordell HJ: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 2002, 11(20):2463-2468.
- [8]Cai Z, Sabaa H, Wang Y, Goebel R, Wang Z, Xu J, Stothard P, Lin G: Most parsimonious haplotype allele sharing determination. BMC Bioinformatics 2009, 10:115. BioMed Central Full Text
- [9]Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G: Fast accurate missing SNP genotype local imputation. BMC Res Notes 2012, 5:404. BioMed Central Full Text
- [10]Cheng Y, Sabaa H, Cai Z, Goebel R, Lin G: Efficient haplotype inference algorithms in one whole genome scan for pedigree data with non-genotyped founders. Acta Math Appl Sinica, English Series 2009, 25(3):477-488.
- [11]Liu W, Chen L: Community detection in disease-gene network based on principal component analysis. Tsinghua Sci Technol 2013, 18(5):454-461.
- [12]Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001, 69:138-147.
- [13]Nelson M, Kardia S, Ferrell R, Sing C: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res 2001, 11(3):458-470.
- [14]Cordell HJ: Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 2009, 10:392-404.
- [15]Wang Y, Liu G, Feng M, Wong L: An empirical comparison of several recent epistatic interaction detection methods. Bioinformatics 2011, 27(21):2936-2943.
- [16]Fang G, Haznadar M, Wang W, Yu H, Steinbach M, Church TR, Oetting WS, Van Ness B, Kumar V: High-order SNP combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. PLoS ONE 2012, 7(4):e33531.
- [17]Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K: Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet 2011, 75:78-89.
- [18]Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W: BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet 2010, 87(3):325-340.
- [19]Wan X, Yang C, Yang Q, Xue H, Tang NLS, Yu W: Detecting two-locus associations allowing for interactions in genome-wide association studies. Bioinformatics 2010, 26(20):2517-2525.
- [20]Xie M, Li J, Jiang T: Detecting genome-wide epistases based on the clustering of relatively frequent items. Bioinformatics 2012, 28:5-12.
- [21]Yung LS, Yang C, Wan X, Yu W: GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies. Bioinformatics 2011, 27(9):1309-1310.
- [22]Liu Y, Xu H, Chen S, Chen X, Zhang Z, Zhu Z, Qin X, Hu L, Zhu J, Zhao GP, Kong X: Genome-wide interaction-based association analysis identified multiple new susceptibility loci for common diseases. PLoS Genet 2011, 7(3):e1001338.
- [23]Li J: A novel strategy for detecting multiple loci in Genome-Wide Association Studies of complex diseases. Int J Bioinform Res Appl 2008, 4(2):150-163.
- [24]Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010, 26:30-37.
- [25]Zhang Y, Liu JS: Bayesian inference of epistatic interactions in case-control studies. Nat Genet 2007, 39:1167-1173.
- [26]Tang W, Wu X, Jiang R, Li Y: Epistatic module detection for case-control studies: a bayesian model with a Gibbs sampling strategy. PLoS Genet 2009, 5(5):e1000464.
- [27]Jiang R, Tang W, Wu X, Fu W: A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics 2009, 10(Suppl 1):S65. BioMed Central Full Text
- [28]Guo X, Ding X, Meng Y, Pan Y: Cloud computing for de novo metagenomic sequence assembly. In Bioinformatics Research and Applications Volume, 7875 of Lecture Notes in Computer Science. Edited by Cai Z, Eulenstein O, Janies D, Schwartz D. New York: Springer Berlin Heidelberg; 2013:185-198.
- [29]Zhang X, Huang S, Zou F, Wang W: TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 2010, 26(12):i217—i227.
- [30]Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007, 31(4):306-315.
- [31]Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J: Complement factor H polymorphism in age-related macular degeneration. Science 2005, 308(5720):385-389.
- [32]Piriyapongsa J, Ngamphiw C, Intarapanich A, Kulawonganunchai S, Assawamakin A, Bootchai C, Shaw P, Tongsima S: iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies. BMC Genomics 2012, 13(Suppl 7):S2. BioMed Central Full Text
- [33]Chen J, Bardes EE, Aronow BJ, Jegga AG: ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 2009, 37(suppl 2):W305—W311.
- [34]Xu X, Jäger J, Kriegel HP: A fast parallel clustering algorithm for large spatial databases. In High Performance Data Mining. Edited by Guo Y, Grossman R. New York: Springer US; 2002:263-290.
- [35]Oh S, Lee J, Kwon MS, Weir B, Ha K, Park T: A novel method to identify high order gene-gene interactions in genome-wide association studies: Gene-based MDR. BMC Bioinformatics 2012, 13(Suppl 9):S5. BioMed Central Full Text
- [36]Steinbach M, Yu H, Fang G, Kumar V: Using constraints to generate and explore higher order discriminative patterns. In Advances in Knowledge Discovery and Data Mining, Volume 6634. Edited by Huang J, Cao L, Srivastava J. New York: Springer Berlin Heidelberg; 2011:338-350.
- [37]Windows Azure Blobs: Programming Blob Storage [http://go.microsoft.com/fwlink/?LinkId=153400 webcite]
- [38]Windows Azure Queue - Programming Queue Storage [http://go.microsoft.com/fwlink/?LinkId=153402 webcite]
PDF