| BMC Bioinformatics | |
| MethylPCA: a toolkit to control for confounders in methylome-wide association studies | |
| Wenan Chen1  Guimin Gao1  Srilaxmi Nerella4  Christina M Hultman3  Patrik KE Magnusson3  Patrick F Sullivan2  Karolina A Aberg4  Edwin JCG van den Oord4  | |
| [1] Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Richmond, VA, USA | |
| [2] Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA | |
| [3] Swedish Schizophrenia Consortium, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden | |
| [4] Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, Richmond, VA, USA | |
| 关键词: MBD-seq; Association test; Eigen-decomposition; Methylome-wide association studies; Principal component analysis; | |
| Others : 1087960 DOI : 10.1186/1471-2105-14-74 |
|
| received in 2012-10-10, accepted in 2013-02-20, 发布年份 2013 | |
PDF
|
|
【 摘 要 】
Background
In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome.
Result
We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders.
Conclusions
MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS.
【 授权许可】
2013 Chen et al; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150117061702420.pdf | 782KB | ||
| Figure 5. | 85KB | Image | |
| Figure 4. | 47KB | Image | |
| Figure 3. | 41KB | Image | |
| Figure 2. | 47KB | Image | |
| Figure 1. | 48KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Petronis A: Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 2010, 465(7299):721-727.
- [2]Reik W, Dean W, Walter J: Epigenetic reprogramming in mammalian development. Science 2001, 293(5532):1089-1093.
- [3]Waterland RA, Jirtle RL: Early nutrition, epigenetic changes at transposons and imprinted genes, and enhanced susceptibility to adult chronic diseases. Nutrition 2004, 20(1):63-68.
- [4]Jost JP, Saluz HP, Pawlak A: Estradiol down regulates the binding activity of an avian vitellogenin gene repressor (MDBP-2) and triggers a gradual demethylation of the mCpG pair of its DNA binding site. Nucleic Acids Res 1991, 19(20):5771-5775.
- [5]Yokomori N, Moore R, Negishi M: Sexually dimorphic DNA demethylation in the promoter of the Slp (sex-limited protein) gene in mouse liver. Proc Natl Acad Sci USA 1995, 92(5):1302-1306.
- [6]Sutherland JE, Costa M: Epigenetics and the environment. Ann NY Acad Sci 2003, 983:151-160.
- [7]Cooney CA: Are somatic cells inherently deficient in methylation metabolism? A proposed mechanism for DNA methylation loss, senescence and aging. Growth Dev Aging 1993, 57(4):261-273.
- [8]Fuks F, Burgers WA, Brehm A, Hughes-Davies L, Kouzarides T: DNA methyltransferase Dnmt1 associates with histone deacetylase activity. Nat Genet 2000, 24(1):88-91.
- [9]Laird PW: The power and the promise of DNA methylation markers. Nat Rev Cancer 2003, 3:253-266.
- [10]Beck S, Rakyan VK: The methylome: approaches for global DNA methylation profiling. Trends Genet 2008, 24(5):231-237.
- [11]Laird PW: Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 2010, 11(3):191-203.
- [12]Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011, 12(8):529-541.
- [13]Mohn F, Weber M, Schubeler D, Roloff TC: Methylated DNA immunoprecipitation (MeDIP). Meth Mol Biol 2009, 507:55-64.
- [14]Serre D, Lee BH, Ting AH: MBD-isolated genome sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res 2010, 38(2):391-399.
- [15]Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, Gunderson KL: Genome-wide DNA methylation profiling using Infinium(R) assay. Epigenomics 2009, 1(1):177-200.
- [16]Aberg K, Khachane AN, Rudolf G, Nerella S, Fugman DA, Tischfield JA, van den Oord EJ: Methylome-wide comparison of human genomic DNA extracted from whole blood and from EBV-transformed lymphocyte cell lines. Eur J Hum Genet 2012, 20(9):953-955.
- [17]Trimarchi MP, Murphy M, Frankhouser D, Rodriguez BA, Curfman J, Marcucci G, Yan P, Bundschuh R: Enrichment-based DNA methylation analysis using next-generation sequencing: sample exclusion, estimating changes in global methylation, and the contribution of replicate lanes. BMC Genom 2012, 13(Suppl 8):S6.
- [18]Chavez L, Jozefczuk J, Grimm C, Dietrich J, Timmermann B, Lehrach H, Herwig R, Adjaye J: Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome Res 2010, 20(10):1441-1450.
- [19]Lan X, Adams C, Landers M, Dudas M, Krissinger D, Marnellos G, Bonneville R, Xu M, Wang J, Huang TH: High resolution detection and analysis of CpG dinucleotides methylation using MBD-Seq technology. PLoS One 2011, 6(7):e22226.
- [20]Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero J, Tomazou EM: A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol 2008, 26(7):779-785.
- [21]Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38(8):904-909.
- [22]Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006, 2(12):e190.
- [23]Bock C, Walter J, Paulsen M, Lengauer T: Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res 2008, 36(10):e55.
- [24]Bollen KA: Structural equations with latent variables. New York: Wiley; 1989.
- [25]Gower JC: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 1966, 53:325-338.
- [26]Rencher A: Methods of Multivariate Analysis. 2nd edition. New York, NY: John Wiley & Sons, Inc; 2002.
- [27]Galassi M, Davies J, Theiler J, Gough B, Jungman G, Alken P, Booth M, Rossi F: GNU Scientific Library Reference Manual. 3rd edition. Godalming, United Kingdom: Network Theory Ltd; 2009.
- [28]Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999, 55:997-1004.
- [29]Aberg KA, McClay JL, Nerella S, Xie LY, Clark SL, Hudson AD, Bukszar J, Adkins D, Consortium SS, Hultman CM: MBD-seq as a cost-effective approach for methylome-wide association studies: demonstration in 1500 case–control samples. Epigenomics 2012, 4(6):605-621.
- [30]Bergen SE, O'Dushlaine CT, Ripke S, Lee PH, Ruderfer D, Akterin S, Moran JL, Chambert KD, Handsaker RE, Backlund L: Genome-wide association study in a Swedish population yields support for greater CNV and MHC involvement in schizophrenia compared to bipolar disorder. Mol PsychiatrIn press
- [31]International Schizophrenia Consortium: Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009, 460:748-752.
- [32]Schizophrenia Psychiatric Genome-Wide Association Study Consortium: Genome-wide association study of schizophrenia identifies five novel loci. Nat Genet 2011, 43:969-976.
- [33]Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD: The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012, 28(6):882-883.
PDF