期刊论文详细信息
BMC Bioinformatics
groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data
Minho Chae1  Charles G. Danko2  W. Lee Kraus1 
[1] Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, 75390, TX, USA
[2] Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, 14853, NY, USA
关键词: ChIP-seq;    Enhancer RNAs (eRNAs);    Long non-coding RNAs (lncRNAs);    Primary miRNAs;    Enhancer;    Cell type specificity;    Peak calling;    Gene regulation;    Primary transcript;    Transcription unit;    Transcription;    groHMM;    GRO-seq;   
Others  :  1230948
DOI  :  10.1186/s12859-015-0656-3
 received in 2014-11-18, accepted in 2015-06-30,  发布年份 2015
PDF
【 摘 要 】

Background

Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units.

Results

Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts.

Conclusions

Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.

【 授权许可】

   
2015 Chae et al.

【 预 览 】
附件列表
Files Size Format View
20151109010011836.pdf 1830KB PDF download
Fig. 7. 175KB Image download
Fig. 6. 63KB Image download
Fig. 5. 56KB Image download
Fig. 4. 87KB Image download
Fig. 3. 69KB Image download
Fig. 2. 90KB Image download
Fig. 1. 41KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

【 参考文献 】
  • [1]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I et al.. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology. 2011; 29(7):644-U130.
  • [2]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ et al.. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology. 2010; 28(5):511-U174.
  • [3]Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009; 10(1):57-63.
  • [4]Preker P, Nielsen J, Kammler S, Lykke-Andersen S, Christensen MS, Mapendano CK et al.. RNA exosome depletion reveals transcription upstream of active human promoters. Science. 2008; 322(5909):1851-1854.
  • [5]Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M et al.. An atlas of active enhancers across human cell types and tissues. Nature. 2014; 507(7493):455-461.
  • [6]Pefanis E, Wang J, Rothschild G, Lim J, Chao J, Rabadan R, Economides AN, Basu U. Noncoding RNA transcription targets AID to divergently transcribed loci in B cells. Nature. 2014;514(7522):389-93.
  • [7]Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods. 2012; 9(3):215-216.
  • [8]Hoffman MM, Buske OJ, Wang J, Weng ZP, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods. 2012; 9(5):473-U488.
  • [9]Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF et al.. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009; 459(7243):108-112.
  • [10]Xi H, Shulha HP, Lin JM, Vales TR, Fu Y, Bodine DM et al.. Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genetics. 2007; 3(8):e136.
  • [11]Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis C, Doyle F et al.. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489(7414):57-74.
  • [12]Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J et al.. Genome-wide analysis of estrogen receptor binding sites. Nature Genetics. 2006; 38(11):1289-1297.
  • [13]Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007; 130(1):77-88.
  • [14]Muse GW, Gilchrist DA, Nechaev S, Shah R, Parker JS, Grissom SF et al.. RNA polymerase is poised for activation across the genome. Nature Genetics. 2007; 39(12):1507-1511.
  • [15]Welboren WJ, van Driel MA, Janssen-Megens EM, van Heeringen SJ, Sweep FC, Span PN et al.. ChIP-Seq of ERalpha and RNA polymerase II defines genes differentially responding to ligands. The EMBO Journal. 2009; 28(10):1418-1428.
  • [16]Zeitlinger J, Stark A, Kellis M, Hong JW, Nechaev S, Adelman K et al.. RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nature Genetics. 2007; 39(12):1512-1516.
  • [17]Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT et al.. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell. 2011; 145(4):622-634.
  • [18]Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. Enhancer transcripts mark active estrogen receptor binding sites. Genome Research. 2013; 23(8):1210-1223.
  • [19]Lam MT, Li W, Rosenfeld MG, Glass CK. Enhancer RNAs and regulated transcriptional programs. Trends in Biochemical Sciences. 2014; 39(4):170-182.
  • [20]Luo X, Chae M, Krishnakumar R, Danko CG, Kraus WL. Dynamic reorganization of the AC16 cardiomyocyte transcriptome in response to TNFalpha signaling revealed by integrated genomic analyses. BMC Genomics. 2014; 15:155. BioMed Central Full Text
  • [21]Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y et al.. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011; 474(7351):390-394.
  • [22]Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A et al.. Landscape of transcription in human cells. Nature. 2012; 489(7414):101-108.
  • [23]Melgar MF, Collins FS, Sethupathy P. Discovery of active enhancers through bidirectional expression of short transcripts. Genome Biology. 2011; 12(11):R113. BioMed Central Full Text
  • [24]Li W, Notani D, Ma Q, Tanasa B, Nunez E, Chen AY et al.. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature. 2013; 498(7455):516-520.
  • [25]Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008; 322(5909):1845-1848.
  • [26]Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D et al.. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009; 458(7235):223-227.
  • [27]Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biology. 2013; 14(11):R131. BioMed Central Full Text
  • [28]Danko CG, Chae M, Martins A, Kraus WL: groHMM: GRO-seq Analysis Pipeline. In: Bioconductor. 1.00.0 edn. http://bioconductor.org/packages/release/bioc/html/groHMM.html: Bioconductor; 2014.
  • [29]Zang CZ, Schones DE, Zeng C, Cui KR, Zhao KJ, Peng WQ. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 2009; 25(15):1952-1958.
  • [30]Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P et al.. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell. 2010; 38(4):576-589.
  • [31]Song QA, Smith AD. Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics. 2011; 27(6):870-871.
  • [32]Allison KA, Kaikkonen MU, Gaasterland T, Glass CK. Vespucci: a system for building annotated databases of nascent transcripts. Nucleic Acids Research. 2014; 42(4):2433-2447.
  • [33]Qin ZS, Yu J, Shen J, Maher CA, Hu M, Kalyana-Sundaram S et al.. HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data. BMC Bioinformatics. 2010; 11:369. BioMed Central Full Text
  • [34]Core LJ, Waterfall JJ, Gilchrist DA, Fargo DC, Kwak H, Adelman K et al.. Defining the status of RNA polymerase at promoters. Cell Reports. 2012; 2(4):1025-1035.
  • [35]Kruesi WS, Core LJ, Waters CT, Lis JT, Meyer BJ. Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife. 2013; 2:e00808.
  • [36]Saunders A, Core LJ, Sutcliffe C, Lis JT, Ashe HL. Extensive polymerase pausing during Drosophila axis patterning enables high-level and pliable transcription. Genes & Development. 2013; 27(10):1146-1158.
  • [37]Lai F, Shiekhattar R. Enhancer RNAs: the new molecules of transcription. Current Opinion in Genetics & Development. 2014; 25:38-42.
  • [38]Smith E, Shilatifard A. Enhancer biology and enhanceropathies. Nature Structural & Molecular Biology. 2014; 21(3):210-219.
  • [39]Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al.. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2005; 102(43):15545-15550.
  • [40]Zinzen RP, Girardot C, Gagneur J, Braun M, Furlong EE. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature. 2009; 462(7269):65-70.
  • [41]Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M et al.. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proceedings of the National Academy of Sciences of the United States of America. 2002; 99(2):757-762.
  • [42]Hardison RC, Taylor J. Genomic approaches towards finding cis-regulatory modules in animals. Nature Reviews Genetics. 2012; 13(7):469-483.
  • [43]Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489(7414):57-74.
  • [44]Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB et al.. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011; 473(7345):43-49.
  • [45]Simon JM, Giresi PG, Davis IJ, Lieb JD. Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nature Protocols. 2012; 7(2):256-267.
  • [46]Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013; 339(6123):1074-1077.
  • [47]Dickel DE, Zhu Y, Nord AS, Wylie JN, Akiyama JA, Afzal V et al.. Function-based identification of mammalian enhancers using site-specific integration. Nature Methods. 2014; 11(5):566-571.
  • [48]Murtha M, Tokcaer-Keskin Z, Tang Z, Strino F, Chen X, Wang Y et al.. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nature Methods. 2014; 11(5):559-565.
  • [49]Kaikkonen MU, Spann NJ, Heinz S, Romanoski CE, Allison KA, Stender JD et al.. Remodeling of the enhancer landscape during macrophage activation is coupled to enhancer transcription. Molecular Cell. 2013; 51(3):310-325.
  • [50]Saldanha AJ. Java Treeview-extensible visualization of microarray data. Bioinformatics. 2004; 20(17):3246-3248.
  • [51]Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139-140.
  • [52]Xu H, Handoko L, Wei XL, Ye CP, Sheng JP, Wei CL et al.. A signal-noise model for significance analysis of ChIP-seq with negative control. Bioinformatics. 2010; 26(9):1199-1204.
  • [53]Rashid NU, Giresi PG, Ibrahim JG, Sun W, Lieb JD. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions. Genome biology 2011, 12(7).
  • [54]Wang JR, Lunyak VV, Jordan IK. BroadPeak: a novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets. Bioinformatics. 2013; 29(4):492-493.
  • [55]Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nussbaum C, Myers RM, Brown M, Li W et al: Model-based Analysis of ChIP-Seq (MACS). Genome biology 2008, 9(9).
  • [56]Supek F, Bosnjak M, Skunca N, Smuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PloS One. 2011; 6(7):e21800.
  • [57]Ward JHJ. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association. 1963; 58:236-244.
  文献评价指标  
  下载次数:87次 浏览次数:17次