期刊论文详细信息
BMC Genomics
Regression hidden Markov modeling reveals heterogeneous gene expression regulation: a case study in mouse embryonic stem cells
Yu Zhang2  Debashis Ghosh1  Yeonok Lee2 
[1]Department of Public Health Sciences, Penn State University, University Park, PA 16802, USA
[2]Department of Statistics, Penn State University, University Park, PA 16802, USA
关键词: Mouse embryonic stem cell;    Gene expression level;    Histone modification;    Regression hidden Markov model;   
Others  :  1217239
DOI  :  10.1186/1471-2164-15-360
 received in 2013-11-28, accepted in 2014-05-06,  发布年份 2014
PDF
【 摘 要 】

Background

Studies have shown the strong association between histone modification levels and gene expression levels. The detailed relationships between the two can vary substantially due to differential regulation, and hence a simple regression model may not be adequate. We apply a regression hidden Markov model (regHMM) to further investigate the potential multiple relationships between genes and histone methylation levels in mouse embryonic stem cells.

Results

Seven histone methylation levels are used in the study. Averaged histone modifications over non-overlapping 200 bp windows on the range transcription starting site (TSS) ± 1 Kb are used as predictors, and in total 70 explanatory variables are generated. Based on regHMM results, genes segregated into two groups, referred to as State 1 and State 2, have distinct association strengths. Genes in State 1 are better explained by histone methylation levels with R2=.72 while those in State 2 have weaker association strength with R2=.38. The regression coefficients in the two states are not very different in magnitude except in the intercept,.25 and 1.15 for State 1 and State 2, respectively. We found specific GO categories that may be attributed to the different relationships. The GO categories more frequently observed in State 2 match those of housekeeping genes, such as cytoplasm, nucleus, and protein binding. In addition, the housekeeping gene expression levels are significantly less explained by histone methylation in mouse embryonic stem cells, which is consistent with the constitutive expression patterns that would be expected.

Conclusion

Gene expression levels are not universally affected by histone methylation levels, and the relationships between the two differ by the gene functions. The expression levels of the genes that perform the most common housekeeping genes’ GO categories are less strongly associated with histone methylation levels. We suspect that additional biological factors may also be strongly associated with the gene expression levels in State 2. We discover that the effect of the presence of CpG island in TSS ± 1 Kb is larger in State 2.

【 授权许可】

   
2014 Lee et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150705161756162.pdf 1043KB PDF download
Figure 7. 35KB Image download
Figure 6. 113KB Image download
Figure 5. 71KB Image download
Figure 4. 217KB Image download
Figure 3. 104KB Image download
Figure 2. 71KB Image download
Figure 1. 15KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Li B, Carey M, Workman JL: The role of chromatin during transcription. Cell 2007, 128:707-719. doi:10.1016/j.cell.2007.01.015.
  • [2]Kouzarides T: Chromatin modifications and their function. Cell 2007, 128:693-705. doi:10.1016/j.cell.2007.02.005.
  • [3]Lemon B, Tjian R: Orchestrated response: a symphony of transcription factors for gene control. Genes Dev 2000, 14:2551-2569. doi:10.1101/gad.831000.
  • [4]Smolle M, Workman JL: Transcription-associated histone modifications and cryptic transcription. Biochim Biophys Acta Gene Regul Mech 2013, 1829:84-97. doi:10.1016/j.bbagrm.2012.08.008.
  • [5]Chung H-R, Lasserre J, Vlahovic̆ek K, Vingron M, Karlić R: Histone modification levels are predictive for gene expression. Proc Natl Acad Sci 2010, 107(7):2926-2931. doi:10.1073/pnas.0909344107.
  • [6]Jung I, Kim D: Histone modification profiles characterize function-specific gene regulation. J Theor Biol 2012, 310:132-142. doi:10.1016/j.jtbi.2012.06.009.
  • [7]Cheng C, Gerstein M: Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells. Nucleic Acids Res 2012, 40:553-568.
  • [8]Zhang Z, Zhang M: Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microrna genes. BMC Bioinformatics 2011, 12:155. doi:10.1186/1471-2105-12-155. BioMed Central Full Text
  • [9]Ernst J, Kellis M: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 2010, 28:817-825.
  • [10]Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, Hakonarson H, Bucan M: Penncnv: an integrated hidden Markov model designed for high-resolution copy optnumber variation detection in whole-genome snp genotyping data. Genome Res 2007, 17:1665-1674.
  • [11]Fujinaga K, Nakai M, Shimodaira H, Sagayama S: Multiple-regression hidden Markov model. In Proceedings of 2001 IEEE International Conference On Acoustics, Speech, and Signal Processing. Volume 1.. IEEE; 2001:513-516.
  • [12]Fridman M: Hidden Markov model regression. Technical report, University of Minnesota, 1993
  • [13]Forney JGD: The Viterbi algorithm. Proc IEEE 1973, 61:268-278.
  • [14]Viterbi A: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theor 1967, 13:260-269.
  • [15]Belsley DA, Kuh E, Welsch RE: Regression Diagnostics. New Jersey: John Wiley & Sons, Inc.; 2005.
  • [16]De Cegli R, Iacobacci S, Flore G, Gambardella G, Mao L, Cutillo L, Lauria M, Klose J, Illingworth E, Banfi S, di Bernardo D: Reverse engineering a mouse embryonic stem cell-specific transcriptional network reveals a new modulator of neuronal differentiation. Nucleic Acids Res 2012, 41:711-726. doi:10.1093/nar/gks1136.
  • [17]Young MD, Willson TA, Wakefield MJ, Trounson E, Hilton DJ, Blewitt ME, Oshlack A, Majewski IJ: Chip-seq analysis reveals distinct h3k27me3 profiles that correlate with transcriptional activity. Nucleic Acids Res 2011. doi:10.1093/nar/gkr416.
  • [18]Fatemi M, Pao MM, Jeong S, Gal-Yam EN, Egger G, Weisenberger DJ, Jones PA: Footprinting of mammalian promoters: use of a cpg dna methyltransferase revealing nucleosome positions at a single molecule level. Nucleic Acids Res 2005, 33:176. doi:10.1093/nar/gni180.
  • [19]Thomson JP, Skene PJ, Selfridge J, Clouaire T, Guy J, Webb S, Kerr ARW, Deaton A, Andrews R, James KD, Bird A: Cpg islands influence chromatin structure via the cpg-binding protein cfp1. Nature 2010, 464(7291):1082-1086.
  • [20]Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E: Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human tata-less core promoters. Gene 2007, 389(1):52-65. doi:10.1016/j.gene.2006.09.029.
  • [21]Gene Ontology Consortium: The gene ontology (go) database and informatics resource. Nucleic Acids Res 2004, 32(suppl 1):258-261. doi:10.1093/nar/gkh036.
  • [22]Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, The Mouse Genome Database Group: The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Rese 2012, 40:881-886. doi:10.1093/nar/gkr974.
  • [23]Saxonov S, Berg P, Brutlag DL: A genome-wide analysis of cpg dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA 2006, 103:1412-1417. doi:10.1073/pnas.0510310103.
  • [24]Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T-K, Koche RP, Lee W, Mendenhall E, O’Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 2007, 448:553-560.
  • [25]Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA, Boyer LA, Young RA, Jaenisch R: Histone h3k27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci 2010, 107(50):21931-21936. doi:10.1073/pnas.1016071107. [http://www.pnas.org/content/107/50/21931.full webcite]
  • [26]Caron H, Baas F, Riggins G, Hermus M-C, Boon K, Voûte PA, Heisterkamp S, Versteeg R, Schaik Bv: The human transcriptome map: Clustering of highly expressed genes in chromosomal domains. Science 2001, 291(5507):1289-1292. doi:10.1126/science.1056794.
  • [27]Baum LE, Petrie T, Soules G, Weiss N: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 1970, 41:164-171.
  • [28]Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 1977, 39:1-38.
  • [29]Cheng C, Yan K-K, Yip K, Rozowsky J, Alexander R, Shou C, Gerstein M: A statistical framework for modeling gene expression using chromatin features and application to modencode datasets. Genome Biol 2011, 12:15. doi:10.1186/gb-2011-12-2-r15.
  • [30]Lee Y, Ghosh D, Hardison RC, Zhang Y: Mrhmms: Multivariate regression hidden Markov models and the variants. Bioinformatics 2014. doi:10.1093/bioinformatics/btu070. [http://bioinformatics.oxfordjournals.org/content/early/2014/02/27/bioinformatics.btu070.abstract webcite]
  • [31]Celeux G, Durand J-B: Selecting hidden Markov model state optnumber with cross-validated likelihood. Comput Stat 2008, 23:541-564. doi:10.1007/s00180-007-0097-1.
  • [32]Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES: Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 2008, 454:766-770.
  • [33]Cloonan N, Forrest ARR, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning J, McKernan KJ, Grimmond SM, Kolle G: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods 2008, 5:613-619.
  • [34]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 2008, 5:621-628.
  • [35]Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res 2002, 12:996-1006. doi:10.1101/gr.229102.
  • [36]de Jonge HJM, Fehrmann RSN, de Bont ESJM, Hofstra RMW, Gerbens F, Kamps WA, de Vries EGE, van der Zee AGJ, te Meerman GJ, ter Elst A: Evidence based selection of housekeeping genes. PLoS ONE 2007, 2:898. doi:10.1371/journal.pone.0000898.
  • [37]Robinson M, Oshlack A: A scaling normalization method for differential expression analysis of rna-seq data. Genome Biol 2010, 11:25. doi:10.1186/gb-2010-11-3-r25.
  • [38]Schwarz G: Estimating the dimension of a model. Ann Stat 1978, 6:461-464.
  文献评价指标  
  下载次数:40次 浏览次数:20次