Epigenetics & Chromatin | |
De novo identification of differentially methylated regions in the human genome | |
Peter L Molloy3  Susan J Clark2  Reginald V Lord4  Katherine Samaras5  Ruth Pidsley6  Aaron L Statham6  Michael J Buckley1  Timothy J Peters1  | |
[1] CSIRO Digital Productivity Flagship, Riverside Life Sciences Centre, 11 Julius Avenue, North Ryde, New South Wales 2113, Australia;St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, New South Wales 2010, Australia;CSIRO Food and Nutrition Flagship, Riverside Life Sciences Centre, 11 Julius Avenue, Sydney, Australia;School of Medicine, University of Notre Dame, Darlinghurst, New South Wales 2010, Australia;St Vincent’s Hospital, Darlinghurst, New South Wales 2010, Australia;Epigenetics Program, Garvan Institute of Medical Research, Sydney, Australia | |
关键词: Illumina; Kernel smoothing; Differential DNA methylation; | |
Others : 1147630 DOI : 10.1186/1756-8935-8-6 |
|
received in 2014-09-04, accepted in 2014-12-17, 发布年份 2015 | |
【 摘 要 】
Background
The identification and characterisation of differentially methylated regions (DMRs) between phenotypes in the human genome is of prime interest in epigenetics. We present a novel method, DMRcate, that fits replicated methylation measurements from the Illumina HM450K BeadChip (or 450K array) spatially across the genome using a Gaussian kernel. DMRcate identifies and ranks the most differentially methylated regions across the genome based on tunable kernel smoothing of the differential methylation (DM) signal. The method is agnostic to both genomic annotation and local change in the direction of the DM signal, removes the bias incurred from irregularly spaced methylation sites, and assigns significance to each DMR called via comparison to a null model.
Results
We show that, for both simulated and real data, the predictive performance of DMRcate is superior to those of Bumphunter and Probe Lasso, and commensurate with that of comb-p. For the real data, we validate all array-derived DMRs from the candidate methods on a suite of DMRs derived from whole-genome bisulfite sequencing called from the same DNA samples, using two separate phenotype comparisons.
Conclusions
The agglomeration of genomically localised individual methylation sites into discrete DMRs is currently best served by a combination of DM-signal smoothing and subsequent threshold specification. The findings also suggest the design of the 450K array shows preference for CpG sites that are more likely to be differentially methylated, but its overall coverage does not adequately reflect the depth and complexity of methylation signatures afforded by sequencing.
For the convenience of the research community we have created a user-friendly R software package called DMRcate, downloadable from Bioconductor and compatible with existing preprocessing packages, which allows others to apply the same DMR-finding method on 450K array data.
【 授权许可】
2015 Peters et al.; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150404030110285.pdf | 1017KB | download | |
Figure 6. | 62KB | Image | download |
Figure 5. | 88KB | Image | download |
Figure 4. | 110KB | Image | download |
Figure 3. | 20KB | Image | download |
Figure 2. | 60KB | Image | download |
Figure 1. | 64KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Wing KSK, Rigoutsos I, Loring J, Wei CL: Dynamic changes in the human methylome during differentiation. Genome Res 2010, 20(3):320-331.
- [2]Doi A, Park IH, Wen B, Murakami P, Aryee MJ, Irizarry R, Herb B, Ladd-Acosta C, Rho J, Loewer S, Miller J, Schlaeger T, Daley GQ, Feinberg AP: Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 2009, 41(12):1350-1353.
- [3]VanderKraats ND, Hiken JF, Decker KF, Edwards JR: Discovering high-resolution patterns of differential DNA methylation that correlate with gene expression changes. Nucleic Acids Res 2013, 41(14):6816-6827. doi:10.1093/nar/gkt482
- [4]Lewin J, Plum A, Hildmann T, Rujan T, Eckhardt F, Liebenberg V, Lofton-Day C, Wasserkort R: Comparative DNA methylation analysis in normal and tumour tissues and in cancer cell lines using differential methylation hybridisation. Int J Biochem Cell Biol 2007, 39(7–8):1539-1550.
- [5]Fernandez AF, Assenov Y, Martin-Subero JI, Balint B, Siebert R, Taniguchi H, et al.: A DNA methylation fingerprint of 1628 human samples. Genome Res 2011, 22(2):407-419. doi:10.1101/gr.119867.110
- [6]Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E, et al.: Comparative isoschizomer profiling of cytosine methylation: the HELP assay. Genome Res 2006, 16(8):1046-1055.
- [7]Coolen MW, Stirzaker C, Song JZ, Statham AL, Kassir Z, Moreno CS, et al.: Consolidation of the cancer genome into domains of repressive chromatin by long-range epigenetic silencing (LRES) reduces transcriptional plasticity. Nat Cell Biol 2010, 12(3):235-246.
- [8]Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011, 12(8):529-541. doi:10.1038/nrg3000
- [9]Suzuki MM, Bird A: DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 2008, 9(6):465-476. doi:10.1038/nrg2341
- [10]Jones PA, Baylin SB: The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002, 3(6):415-428.
- [11]Aran D, Toperoff G, Rosenberg M, Hellman A: Replication timing-related and gene body-specific methylation of active human genes. Hum Mol Genet 2011, 20(4):670-680.
- [12]Bert SA, Robinson MD, Strbenac D, Statham AL, Song JZ, Hulf T, et al.: Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell 2013, 23(1):9-22.
- [13]Spilianakis CG, Lalioti MD, Town T, Lee GR, Flavell RA: Interchromosomal associations between alternatively expressed loci. Nature 2005, 435(7042):637-645. doi:10.1038/nature03574
- [14]Stirzaker C, Taberlay PC, Statham AL, Clark SJ: Mining cancer methylomes: prospects and challenges. Trends Genet 2013, 30(2):75-84.
- [15]Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, et al.: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462(7271):315-322.
- [16]Boyle P, Clement K, Gu H, Smith ZD, Ziller M, Fostel JL, et al.: Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biol 2012, 13(10):R92. doi:10.1186/gb-2012-13-10-r92 BioMed Central Full Text
- [17]Hodges E, Smith AD, Kendall J, Xuan Z, Ravi K, Rooks M, et al.: High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Res 2009, 19(9):1593-1605.
- [18]Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al.: High density DNA methylation array with single CpG site resolution. Genomics 2011, 98(4):288-295.
- [19]Roessler J, Ammerpohl O, Gutwein J, Hasemeier B, Anwar SL, Kreipe H, et al.: Quantitative cross-validation and content analysis of the 450k DNA methylation array from Illumina, Inc. BMC Res Notes 2012, 5(1):210. doi:10.1186/1756-0500-5-210 BioMed Central Full Text
- [20]Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, Gomez A, et al.: Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci USA 2012, 109(26):10522-10527. doi:10.1073/pnas.1120658109
- [21]Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F: Evaluation of the Infinium Methylation 450K technology. Epigenomics 2011, 3(6):771-784. doi:10.2217/epi.11.105
- [22]Sandoval J, Heyn HA, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al.: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 2011, 6(6):692-702.
- [23]Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, et al.: A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 2013, 14(1):293. BioMed Central Full Text
- [24]Maksimovic J, Gordon L, Oshlack A: SWAN: Subset quantile Within-Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol 2012, 13(6):44. doi:10.1186/gb-2012-13-6-r44 BioMed Central Full Text
- [25]Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, et al.: Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol 2012, 41(1):200-209. doi:10.1093/ije/dyr238
- [26]Pedersen BS, Da Schwartz, Yang IV, Kechris KJ: comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics 2012, 28(22):2986-2988. doi:10.1093/bioinformatics/bts545
- [27]Butcher LM, Beck S: Probe Lasso: A novel method to rope in differentially methylated regions with 450K DNA methylation data. Methods (San Diego, Calif.) 2015, 72:21-28. doi:10.1016/j.ymeth.2014.10.036
- [28]Wang D, Yan L, Hu Q, Sucheston LE, Higgins MJ, Ambrosone CB, et al.: IMA: an R package for high-throughput analysis of Illumina’s 450K Infinium methylation data. Bioinformatics 2012, 28(5):729-730. doi:10.1093/bioinformatics/bts013
- [29]Warden CD, Lee H, Tompkins JD, Li X, Wang C, Riggs AD, et al.: COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis. Nucleic Acids Res 2013, 41(11):e117.
- [30]Zhang Y, Liu H, Lv J, Xiao X, Zhu J, Liu X, et al.: QDMR: a quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Res 2011, 39(9):58.
- [31]Feng H, Conneely KN, Wu H: A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res 2014, 42(8):69.
- [32]Sun D, Xi Y, Rodriguez B, Park HJ, Tong P, Meong M, et al.: MOABS: model based analysis of bisulfite sequencing data. Genome Biol 2014, 15(2):38. BioMed Central Full Text
- [33]Li S, Garrett-Bakelman FE, Akalin A, Zumbo P, Levine R, To BL, et al.: An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinformatics 2013, 14(Suppl 5):10. doi:10.1186/1471-2105-14-S5-S10 BioMed Central Full Text
- [34]Du P, Bourgon R: methyAnalysis: DNA Methylation Data Analysis and Visualization 2014, R package version 1.8.0. http://www.bioconductor.org/packages/release/bioc/html/methyAnalysis.html webcite
- [35]Hebestreit K, Dugas M, Klein HU: Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013, 29:1647-1653.
- [36]Stockwell Pa, Chatterjee A, Rodger EJ, Morison IM: DMAP: differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014, 30(13):1814-1822.
- [37]Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, et al.: methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 2012, 13(10):R87. doi:10.1186/gb-2012-13-10-r87 BioMed Central Full Text
- [38]Dolzhenko E, Smith AD: Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics 2014, 15(1):215. doi:10.1186/1471-2105-15-215 BioMed Central Full Text
- [39]Robinson MD, Strbenac D, Stirzaker C, Statham AL, Song J, Speed TP, et al.: Copy-number-aware differential analysis of quantitative DNA sequencing data. Genome Res 2012, 22(12):2489-2496. doi:10.1101/gr.139055.112
- [40]Zhang B, Zhou Y, Lin N, Lowdon RF, Hong C, Nagarajan RP, et al.: Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm. Genome Res 2013, 23(9):1522-1540. doi:10.1101/gr.156539.113
- [41]Sofer T, Schifano ED, Hoppin JA, Hou L, Baccarelli AA: A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure. Bioinformatics 2013, 29(22):2884-2891. doi:10.1093/bioinformatics/btt498
- [42]Robinson MD, Kahraman A, Law CW, Lindsay H, Nowicka M, Weber LM, et al.: Statistical methods for detecting differentially methylated loci and regions. Front Genet 2014, 5(324):eCollection 2014. doi:10.3389/fgene.2014.00324
- [43]Smyth GK: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R. and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. New York: Springer; 2005.
- [44]van Dijk SJ, Molloy PL, Varinli H, Morrison JL, Muhlhausler BS: Epigenetics and human obesity. Int J Obes 2014, 39(1):85-97.
- [45]Chen Y-A, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al.: Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 2013, 8(2):203-209. doi:10.4161/epi.23470
- [46]Hansen KD, Langmead B, Irizarry RA: BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. 2012, 13(10):R83. doi:10.1186/gb-2012-13-10-r83
- [47]Slieker R, Bos S, Goeman J, Bovee J, Talens R, et al.: Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenetics Chromatin 2013, 6(1):26. doi:10.1186/1756-8935-6-26 BioMed Central Full Text
- [48]Young M, Wakefield M, Smyth G, Oshlack A: Gene ontology analysis for RNA-Seq: accounting for selection bias. Genome Biol 2010, 11(2):14. doi:10.1186/gb-2010-11-2-r14 BioMed Central Full Text
- [49]Geeleher P, Hartnett L, Egan LJ, Golden A, Raja Ali RA, Seoighe C: Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics 2013, 29(15):1851-1857. doi:10.1093/bioinformatics/btt311
- [50]Tibshirani R: Regression shrinkage and selection via the lasso. J Roy Stat Soc B 1996, 58(1):267-288. doi:10.1111/j.1553-2712.2009.0451c.x.
- [51]Storey JD: The optimal discovery procedure: a new approach to simultaneous significance testing. J Roy Stat Soc B 2007, 69(3):347-368. doi:10.1111/j.1467-9868.2007.005592.x
- [52]Šidák Z, Sidak Z: Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 1967, 62(318):626-633. doi:10.1080/01621459.1967.10482935
- [53]Riley JW, Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RM: The American Soldier: Adjustment During Army Life. Princeton: Princeton University Press; 1949. doi:10.2307/2087216
- [54]Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al.: Software for computing and annotating genomic ranges. PLoS Comput Biol 2013, 9(8):e1003118.
- [55]Boyd K, Eng K, Page CD: Area under the precision–recall curve: point estimates and confidence intervals. In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Volume 8190. Edited by Blockeel H, Kersting K, Nijssen S, železný F. Springer; 2013. doi:10.1007/978-3-642-40994-3_29
- [56]Keilwagen J, Grosse I, Grau J: Area under precision–recall curves for weighted and unweighted data. PloS One 2014, 9(3):92209. doi:10.1371/journal.pone.0092209
- [57]Sonnenburg S, Schweikert G, Philips P, Behr J, Rätsch G: Accurate splice site prediction using support vector machines. BMC Bioinformatics 2007, 8(Suppl 10):7. doi:10.1186/1471-2105-8-S10-S7 BioMed Central Full Text
- [58]Alexiou P, Maragkakis M, Papadopoulos GL, Reczko M, Hatzigeorgiou AG: Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics 2009, 25(23):3049-3055. doi:10.1093/bioinformatics/btp565
- [59]Day K, Waite LL, Thalacker-Mercer A, West A, Bamman MM, Brooks JD, et al.: Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape. Genome Biol 2013, 14(9):102. doi:10.1186/gb-2013-14-9-r102 BioMed Central Full Text
- [60]Parzen M, Lipsitz SR: A global goodness-of-fit statistic for Cox regression models. Biometrics 1999, 55(2):580-584.
- [61]Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008, 9:387-402. doi:10.1146/annurev.genom.9.081307.164359
- [62]Lehrach H: DNA sequencing methods in human genetics and disease research. F1000prime Reports 2013, 5(September):34. doi:10.12703/P5-34
- [63]Metzker ML: Sequencing technologies – the next generation. Nat Rev Genet 2010, 11(1):31-46. doi:10.1038/nrg2626.209
- [64]Andrew RL, Rieseberg LH: Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes. Evolution 2013, 67(9):2468-2482. doi:10.1111/evo.12106
- [65]Satterthwaite FE: An approximate distribution of estimates of variance components. Biometrics 1946, 2(6):110-114.
- [66]Duong T: Local significant differences from nonparametric two-sample tests. J Nonparametric Stat 2013, 25(3):635-645. doi: 10.1080/10485252.2013.810217
- [67]Buckley MJ, Eagleson GK: An approximation to the distribution of quadratic forms in normal random variables. Aust J Stat 1988, 30A(1):150-159. doi:10.1111/j.1467-842X.1988.tb00471.x
- [68]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B 1995, 57(1):289-300. doi: 10.2307/2346101.95/57289
- [69]DMRcate [http://www.bioconductor.org/packages/release/bioc/html/DMRcate.html webcite]
- [70]Peters TJ, Buckley MJ, Statham AL, Pidsley R, Clark SJ, Molloy PL: DMRcate: Illumina 450K Methylation Array Spatial Analysis Methods. 2014. [http://www.bioconductor.org/packages/release/bioc/manuals/DMRcate/man/DMRcate.pdf webcite]