期刊论文详细信息
BioData Mining
Uncovering correlated variabilityin epigenomic datasets usingthe Karhunen-Loeve transform
Pedro Madrigal1  Paweł Krajewski2 
[1] Present address: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
[2] Department of Biometry and Bioinformatics, Institute of Plant Genetics of the Polish Academy of Sciences, Strzeszyńska 34, Poznań 60-479, Poland
关键词: H2A.Z;    H3K9ac;    H3K36me3;    H3K4me3;    Roadmap Epigenomics Consortium;    H1;    Stem cells;    Functional data analysis;    ChIP-seq;    Histone modifications;   
Others  :  1216982
DOI  :  10.1186/s13040-015-0051-7
 received in 2014-11-10, accepted in 2015-06-17,  发布年份 2015
PDF
【 摘 要 】

Background

Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets.

Results

Here we bring to genomics a scenario of functional principal component analysis, a finite Karhunen-Loève transform, and explicitly decompose the variation in the coverage profiles of 27 chromatin mark ChIP-seq datasets at transcription start sites for H1, one of the most used human embryonic stem cell lines. Using this approach we identify positive correlations between H3K4me3 and H3K36me3, as well as between H3K9ac and H3K36me3, so far undetected by the most commonly used Pearson correlation between read enrichment coverages. We uncover highly negative correlations between H2A.Z, H3K4me3, and several histone acetylation marks, but these occur only between principal components of first and second order. We also demonstrate that levels of gene expression correlate significantly with scores of components of order higher than one, demonstrating that transcriptional regulation by histone marks escapes simple one-to-one relationships. This correlations were higher in significance and magnitude in protein coding genes than in non-coding RNAs.

Conclusions

In summary, we present a methodology to explore and uncover novel patterns of epigenomic variability and covariability in genomic data sets by using a functional eigenvalue decomposition of genomic data. R code is available at: http://github.com/pmb59/KLTepigenome.

【 授权许可】

   
2015 Madrigal and Krajewski.

【 预 览 】
附件列表
Files Size Format View
20150704012431812.pdf 2502KB PDF download
Fig. 4. 68KB Image download
Fig. 3. 33KB Image download
Fig. 2. 44KB Image download
Fig. 1. 89KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

【 参考文献 】
  • [1]Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M, et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489(7414):57-74.
  • [2]Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, et al.: Roadmap Epigenomics Consortium: Integrative analysis of 111 reference human epigenomes. Nature 2015, 518(7539):317-30.
  • [3]Satterlee JS, Schubeler D, Ng HH: Tackling the epigenome: challenges and opportunities for collaboration. Nat Biotechnol 2010, 28(10):1039-44.
  • [4]Meyer CA, Liu XS: Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet. 2014, 15(11):709-21.
  • [5]Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science 2007, 316(5830):1497-502.
  • [6]Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, et al.: Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 2007, 4(8):651-7.
  • [7]Song L, Crawford GE: DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc 2010, 2010(2):5384.
  • [8]Gaulton KJ, Nammo T, Pasquali L, Simon JM, Giresi PG, Fogarty MP, et al.: A map of open chromatin in human pancreatic islets. Nat Genet 2010, 42(3):255-9.
  • [9]Rhee HS, Pugh BF: Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 2011, 147(6):1408-19.
  • [10]He Q, Johnston J, Zeitlinger J: ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol. 2015, 33(4):395-401.
  • [11]Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 2013, 10(12):1213-8.
  • [12]van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C: Ten years of next-generation sequencing technology. Trends Genet 2014, 30(9):418-26.
  • [13]McPherson JD: A defining decade in DNA sequencing. Nat Methods 2014, 11(10):1003-5.
  • [14]Risca VI, Greenleaf WJ. Unraveling the 3D genome: genomics tools for multiscale exploration. Trends Genet. 2015. doi:. dx.doi.org/10.1016/j.tig.2015.03.010 webcite
  • [15]Lee JS, Smith E, Shilatifard A: The language of histone crosstalk. Cell 2010, 142(5):682-5.
  • [16]Campos EI, Reinberg D: Histones: annotating chromatin. Annu Rev Genet 2009, 43:559-99.
  • [17]de Pretis S, Pelizzola M: Computational and experimental methods to decipher the epigenetic code. Front Genet 2014, 5:335.
  • [18]Portela A, Esteller M: Epigenetic modifications and human disease. Nat Biotechnol 2010, 28(10):1057-68.
  • [19]van Dijk EL, Jaszczyszyn Y, Thermes C: Library preparation methods for next-generation sequencing: tone down the bias. Exp Cell Res 2014, 322(1):12-20.
  • [20]Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD: The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012, 28(6):882-3.
  • [21]Schwartz S, Oren R, Ast G: Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS ONE 2011, 6(1):16685.
  • [22]Macaulay IC, Voet T: Single cell genomics: advances and future perspectives. PLoS Genet 2014, 10(1):1004126.
  • [23]Milosavljevic A: Emerging patterns of epigenomic variation. Trends Genet 2011, 27(6):242-50.
  • [24]Ernst J, Kellis M: Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat Biotechnol. 2015, 33(4):364-76.
  • [25]Benayoun BA, Pollina EA, Ucar D, Mahmoudi S, Karra K, Wong ED, et al.: H3K4me3 breadth is linked to cell identity and transcriptional consistency. Cell 2014, 158(3):673-88.
  • [26]Schweikert G, Cseke B, Clouaire T, Bird A, Sanguinetti G: MMDiff: quantitative testing for shape changes in ChIP-Seq data sets. BMC Genomics 2013, 14:826. BioMed Central Full Text
  • [27]Wu T, Liu Y, Wen D, Tseng Z, Tahmasian M, Zhong M, et al.: Histone Variant H2A.X deposition pattern serves as a functional epigenetic mark for distinguishing the developmental potentials of iPSCs. Cell Stem Cell 2014, 15(3):281-94.
  • [28]Hawkins RD, Hon GC, Ren B: Next-generation genomics: an integrative approach. Nat Rev Genet 2010, 11(7):476-86.
  • [29]Almouzni G, Altucci L, Amati B, Ashley N, Baulcombe D, Beaujean N, et al.: Relationship between genome and epigenome - challenges and requirements for future research. BMC Genomics 2014, 15:487. BioMed Central Full Text
  • [30]Strahl BD, Allis CD: The language of covalent histone modifications. Nature 2000, 403(6765):41-5.
  • [31]Ernst J, Kellis M: ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 2012, 9(3):215-6.
  • [32]Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 2010, 28(8):817-25.
  • [33]Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 2012, 9(5):473-6.
  • [34]Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, et al.: Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 2013, 41(2):827-41.
  • [35]Hon G, Ren B, Wang W: ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput Biol 2008, 4(10):1000201.
  • [36]MacArthur S, Li XY, Li J, Brown JB, Chu HC, Zeng L, et al.: Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol 2009, 10(7):80. BioMed Central Full Text
  • [37]Pajoro A, Madrigal P, Muino JM, Matus JT, Jin J, Mecchia MA, et al.: Dynamics of chromatin accessibility and gene regulation by MADS-domain transcription factors in flower development. Genome Biol 2014, 15(3):41. BioMed Central Full Text
  • [38]Zhou J, Troyanskaya OG: Global quantitative modeling of chromatin factor interactions. PLoS Comput Biol 2014, 10(3):1003525.
  • [39]Lasserre J, Chung HR, Vingron M: Finding associations among histone modifications using sparse partial correlation networks. PLoS Comput Biol 2013, 9(9):1003168.
  • [40]Assenov Y, Muller F, Lutsik P, Walter J, Lengauer T, Bock C: Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014, 11(11):1138-40.
  • [41]Bock C: Analysing and interpreting DNA methylation data. Nat Rev Genet 2012, 13(10):705-19.
  • [42]Filion GJ, van Bemmel JG, Braunschweig U, Talhout W, Kind J, Ward LD, et al.: Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 2010, 143(2):212-24.
  • [43]Julienne H, Zoufir A, Audit B, Arneodo A: Human genome replication proceeds through four chromatin states. PLoS Comput Biol 2013, 9(10):1003233.
  • [44]Frøslie KF, Røislien J, Qvigstad E, Godang K, Bollerslev J, Voldner N, et al.: Shape information from glucose curves: functional data analysis compared with traditional summary measures. BMC Med Res Methodol 2013, 13:6. BioMed Central Full Text
  • [45]Xu J, Shao Z, Glass K, Bauer DE, Pinello L, Van Handel B, et al.: Combinatorial assembly of developmental stage-specific enhancers controls gene expression programs during human erythropoiesis. Dev Cell 2012, 23(4):796-811.
  • [46]Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, et al.: The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 2010, 28(10):1045-1048.
  • [47]Ramsay JO, Silverman BW: Functional Data Analysis. Springer, New York; 2005.
  • [48]The Oxford Handbook of Functional Data Analysis. Oxford University Press, New York; 2011.
  • [49]Ullah S, Finch CF: Applications of functional data analysis: A systematic review. BMC Med Res Methodol 2013, 13:43. BioMed Central Full Text
  • [50]Aguilera A, Aguilera-Morillo MC, Escabias M, Valderrama M: Penalized Spline Approaches for Functional Principal Component Logit Regression. In Recent Advances in Functional Data Analysis and Related Topics. Edited by Ferraty F. Springer, Berlin Heidelberg; 2011.
  • [51]Wang L, Wang S, Li W: RSeQC: quality control of RNA-seq experiments. Bioinformatics 2012, 28(16):2184-185.
  • [52]Carroll TS, Liang Z, Salama R, Stark R, de Santiago I: Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data. Front Genet 2014, 5:75.
  • [53]Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, et al.: Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 2013, 9(11):1003326.
  • [54]Wu H, Ji H: PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information. PLoS ONE 2014, 9(3):89694.
  • [55]Mendoza-Parra MA, Nowicka M, Van Gool W, Gronemeyer H: Characterising ChIP-seq binding patterns by model-based peak shape deconvolution. BMC Genomics 2013, 14:834. BioMed Central Full Text
  • [56]Mateos J, Madrigal P, Tsuda K, Rawat V, Richter R, Romera-Branchat M: Combinatorial activities of short vegetative phase and flowering locus C define distinct modes of flowering regulation in Arabidopsis. Genome Biol 2015, 16(1):31. BioMed Central Full Text
  • [57]Okoniewski MJ, Leśniewska A, Szabelska A, Zyprych-Walczak J, Ryan M, Wachtel M, et al.: Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage. Nucleic Acids Res. 2012, 40(9):63.
  • [58]Mayo TR, Schweikert G, Sanguinetti G: M3D: a kernel-based test for spatially correlated changes in methylation profiles. Bioinformatics 2015, 31(6):809-16.
  文献评价指标  
  下载次数:38次 浏览次数:17次