期刊论文详细信息
BMC Bioinformatics
An efficient algorithm to perform multiple testing in epistasis screening
François Van Lishout2  Jestinah M Mahachie John2  Elena S Gusareva2  Victor Urrea5  Isabelle Cleynen3  Emilie Théâtre4  Benoît Charloteaux1  Malu Luz Calle5  Louis Wehenkel2  Kristel Van Steen2 
[1] Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000 Liège, Belgium
[2] Bioinformatics and Modeling, GIGA-R, University of Liège, 4000 Liège, Belgium
[3] Department of Gastroenterology, KU Leuven, 3000 Leuven, Belgium
[4] Unit of Hepato-Gastroenterology, CHU de Liège and Faculty of Medicine, University of Liège, 4000 Liège, Belgium
[5] Department of Systems Biology, University of Vic, 08500 Vic, Spain
关键词: Crohn’s disease;    GWA studies;    MB-MDR;    maxT;    Multiple testing;    Epistasis;   
Others  :  1087897
DOI  :  10.1186/1471-2105-14-138
 received in 2012-05-10, accepted in 2013-04-12,  发布年份 2013
PDF
【 摘 要 】

Background

Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease.

Results

In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data.

Conclusions

Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

【 授权许可】

   
2013 Van Lishout et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117054316235.pdf 508KB PDF download
Figure 5. 93KB Image download
Figure 4. 40KB Image download
Figure 3. 35KB Image download
Figure 2. 62KB Image download
Figure 1. 25KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Hardy J, Singleton A: Genome-wide association studies and human disease. N Engl J Med 2009, 360:1759-1768.
  • [2]Manolio TA, Collins FS, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH: Finding the missing heritability of complex diseases. Nature 2009, 461(7265):747-753.
  • [3]Visscher PM, Brown MA, McCarthy MI, Yang J: Five years of GWAS discovery. Am Soc Hum Genet 2012, 90:7-24.
  • [4]Zuk O, Hechter E, Sunyaev SR, Lander ES: The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci 2012, 109(4):1193-1198.
  • [5]Van Steen K: Traveling the world of gene-gene interactions. Brief Bioinform 2011, 13:1-19.
  • [6]Calle ML, Urrea V, Malats N, Van Steen K: MB-MDR: model-based multifactor dimensionality reduction for detecting interactions in high-dimensional genomic data. Tech. Rep. 24, Department of Systems Biology, Universitat de Vic, Vic,: Spain; 2008
  • [7]Calle ML, Urrea V, Vellalta G, Malats N, Van Steen K: Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Stat Med 2008, 27:6532-6546.
  • [8]Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K: Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet 2011, 75:78-89.
  • [9]Mahachie John JM, Cattaert T, Van Lishout F, Gusareva E, Van Steen K: Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS ONE 2012, 7(1):e29594. http://dx.doi.org/10.1371/journal.pone.0029594 webcite
  • [10]Calle ML, Urrea V, Malats N, Van Steen K: mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits. Bioinformatics 2010, 26(17):2198-2199.
  • [11]Ge Y, Dudoit S, Speed TP: Resampling-based multiple testing for microarray data analysis. Tech. Rep. 633, Department of Statistics: University of California, Berkley; 2003
  • [12]Westfall PH, Young SS: Resampling-base Multiple Testing. New York: Wiley; 1993.
  • [13]Knuth D: The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition. Addison-Wesley: Reading; 1998.
  • [14]Cattaert T, Urrea V, Naj AC, De Lobel L, De Wit V, Fu M, Mahachie John JM, Shen H, Calle ML, Ritchie MD: FAM-MDR: A Flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS ONE 2010, 5(4):e10304. http://dx.doi.org/10.1371/journal.pone.0010304 webcite
  • [15]Mahachie John JM, Van Lishout F, Van Steen K: Model-based multifactor dimensionality reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011, 19(6):696-703.
  • [16]Ritchie MD, Hahn LW, Moore JH: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemil 2003, 24(2):150-157.
  • [17]Libioulle C, Louis E, Hansoul S, Sandor C, Farnir F, Franchimont D, Vermeire S, Dewit O, de Vos M, Dixon A: Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet 2007, 3(4):e58.
  • [18]Barett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmada MM: Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat Genet 2008, 40(8):955-962.
  • [19]Bush WL, Dudek SM, Ritchie MD: Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pacific Symposium on Biocomputing 2009, 368-379. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859610/pdf/nihms186228.pdf webcite]
  • [20]Raychaudhuri S, Plenge RM, Rossin E, Ng AC, Consortium IS, Purcell SM, Sklar P, Scolnick EM, Xavier RJ, Altshuler D: Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet 2009, 5(9):1-15.
  • [21]Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith G, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R: Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 2010, 42(12):1118-1126.
  • [22]Kaser A, Zeissig S, Blumberg RS: Inflammatory bowel disease. Annu Rev Immunol 2010, 28:573-621.
  • [23]Dalal SR, Kwon HK: The role of MicroRNA in inflammatory bowel disease. Gastroenterol Hepatol 2010, 6:714-722.
  • [24]Watkinson J, Anastassiou D: Synergy disequilibrium plots: graphical visualization of pairwise synergies and redundancies of SNPs with respect to a phenotype. Bioinformatics 2009, 25(11):1445-1446.
  • [25]Taylor KD, Targn SR, Mei L, Ippoliti AF, McGovern D, Mengesha E, King L, Rotter JI: IL23R Haplotypes provide a large population attributable risk for Crohn’s disease. Inflamm Bowel Dis 2008, 14(9):1185-1191.
  • [26]Zhou X, Richon VM, Wang AH, Yang XJ, Rifkind RA, Marks PA: Histone deacetylase 4 associates with extracellular signal-regulated kinases 1 and 2, and its cellular localization is regulated by oncogenic Ras. Proc Natl Acad Sci USA 2000, 97:14329-14333.
  • [27]Sarin R, Wu X, Abraham C: Inflammatory disease protective R381Q IL23 receptor polymorphism results in decreased primary CD4+ and CD8+ human T-cell functional responses. Proc Natl Acad Sci USA 2011.
  • [28]Sinnott-Armstrong NA, Greene CS, Cancare F, Moore JH: Accelerating epistasis analysis in human genetics with consumer graphics hardware. BMC Res Notes 2009, 2:149. BioMed Central Full Text
  • [29]Wang Z, Wang Y, Tan KL, Wong L, Agrawal D: CEO: a cloud epistasis computing model in GWAS. International Conference on Bioinformatics & Biomedicine; Hong Kong 2010. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5706522 webcite]
  文献评价指标  
  下载次数:64次 浏览次数:8次