期刊论文详细信息
BMC Genomics
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
Wyeth W Wasserman3  Luis del Peso2  Anthony Mathelier1  Rebecca Worsley Hunt1 
[1] Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada;Universidad Autónoma de Madrid, Biochemistry, Madrid 28029, Spain;Centre for Molecular Medicine and Therapeutics, 950 W.28th Avenue, Vancouver, BC V5Z 4H4, Canada
关键词: Visualization;    Transcription factor binding site;    Transcription factor;    Sequence analysis;    Regulation;    Over-representation analysis;    Motif prediction;    ChIP-Seq;    Chromatin immunoprecipitation;   
Others  :  1216579
DOI  :  10.1186/1471-2164-15-472
 received in 2013-12-20, accepted in 2014-05-20,  发布年份 2014
PDF
【 摘 要 】

Background

Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS).

Results

We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11.

Conclusions

Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding.

【 授权许可】

   
2014 Worsley Hunt et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150701084655767.pdf 1618KB PDF download
Figure 8. 209KB Image download
Figure 7. 124KB Image download
Figure 6. 164KB Image download
Figure 5. 144KB Image download
Figure 4. 96KB Image download
Figure 3. 179KB Image download
Figure 2. 75KB Image download
Figure 1. 58KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

【 参考文献 】
  • [1]Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res 2010, 38(Database issue):D105-D110.
  • [2]Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res 2013, 41(Database issue):D195-D202.
  • [3]Machanick P, Bailey TL: MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 2011, 27(12):1696-1697.
  • [4]Georgiev S, Boyle AP, Jayasurya K, Ding X, Mukherjee S, Ohler U: Evidence-ranked motif identification. Genome Biol 2010, 11(2):R19.
  • [5]Kulakovskiy IV, Boeva VA, Favorov AV, Makeev VJ: Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 2010, 26(20):2622-2623.
  • [6]Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, Van Helden J: RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res 2012, 40(4):e31.
  • [7]Rhee HS, Pugh BF: Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 2011, 147(6):1408-1419.
  • [8]Auerbach RK, Euskirchen G, Rozowsky J, Lamarre-Vincent N, Moqtaderi Z, Lefrancois P, Struhl K, Gerstein M, Snyder M: Mapping accessible chromatin regions using Sono-Seq. Proc Natl Acad Sci USA 2009, 106(35):14926-14931.
  • [9]Teytelman L, Thurtle DM, Rine J, Van Oudenaarden A: Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc Natl Acad Sci USA 2013, 110(46):18602-18607.
  • [10]Thomas-Chollier M, Defrance M, Medina-Rivera A, Sand O, Herrmann C, Thieffry D, Van Helden J: RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res 2011, 39:W86-W91.
  • [11]Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK: Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010, 38(4):576-589.
  • [12]Kwon AT, Arenillas DJ, Worsley Hunt R, Wasserman WW: oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets. G3 2012, 2(9):987-1002.
  • [13]R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing 2013.
  • [14]Bailey TL, Machanick P: Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res 2012, 40(17):e128.
  • [15]Wilbanks EG, Facciotti MT: Evaluation of algorithm performance in ChIP-seq peak detection. PLoS ONE 2010, 5(7):e11471.
  • [16]McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G: GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 2010, 28(5):495-501.
  • [17]Miano JM, Long X, Fujiwara K: Serum response factor: master regulator of the actin cytoskeleton and contractile apparatus. Am J Physiol Cell Physiol 2007, 292(1):C70-C81.
  • [18]Singh S, Vrishni S, Singh BK, Rahman I, Kakkar P: Nrf2-ARE stress response mechanism: a control point in oxidative stress-mediated dysfunctions and chronic inflammatory diseases. Free Radic Res 2010, 44(11):1267-1288.
  • [19]Gotea V, Visel A, Westlund JM, Nobrega MA, Pennacchio LA, Ovcharenko I: Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res 2010, 20(5):565-577.
  • [20]Giorgetti L, Siggers T, Tiana G, Caprara G, Notarbartolo S, Corona T, Pasparakis M, Milani P, Bulyk ML, Natoli G: Noncooperative interactions between transcription factors and clustered DNA binding sites enable graded transcriptional responses to environmental inputs. Mol Cell 2010, 37(3):418-428.
  • [21]Ji Z, Donaldson IJ, Liu J, Hayes A, Zeef LA, Sharrocks AD: The forkhead transcription factor FOXK2 promotes AP-1-mediated transcriptional regulation. Mol Cell Biol 2012, 32(2):385-398.
  • [22]Yu X, Zhu X, Pi W, Ling J, Ko L, Takeda Y, Tuan D: The long terminal repeat (LTR) of ERV-9 human endogenous retrovirus binds to NF-Y in the assembly of an active LTR enhancer complex NF-Y/MZF1/GATA-2. J Biol Chem 2005, 280(42):35184-35194.
  • [23]Razzaque MA, Masuda N, Maeda Y, Endo Y, Tsukamoto T, Osumi T: Estrogen receptor-related receptor gamma has an exceptionally broad specificity of DNA sequence recognition. Gene 2004, 340(2):275-282.
  • [24]Watson DK, Robinson L, Hodge DR, Kola I, Papas TS, Seth A: FLI1 and EWS-FLI1 function as ternary complex factors and ELK1 and SAP1a function as ternary and quaternary complex factors on the Egr1 promoter serum response elements. Oncogene 1997, 14(2):213-221.
  • [25]Schmid CD, Bucher P: MER41 repeat sequences contain inducible STAT1 binding sites. PLoS ONE 2010, 5(7):e11425.
  • [26]Ferrigno O, Virolle T, Djabari Z, Ortonne JP, White RJ, Aberdam D: Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat Genet 2001, 28(1):77-81.
  • [27]Schaub M, Myslinski E, Schuster C, Krol A, Carbon P: Staf, a promiscuous activator for enhanced transcription by RNA polymerases II and III. EMBO J 1997, 16(1):173-181.
  • [28]Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, Taipale J: DNA-binding specificities of human transcription factors. Cell 2013, 152(1–2):327-339.
  • [29]Ngondo-Mbongo RP, Myslinski E, Aster JC, Carbon P: Modulation of gene expression via overlapping binding sites exerted by ZNF143, Notch1 and THAP11. Nucleic Acids Res 2013, 41(7):4000-4014.
  • [30]Whitington T, Frith MC, Johnson J, Bailey TL: Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res 2011, 39(15):e98.
  • [31]Guo Y, Mahony S, Gifford DK: High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol 2012, 8(8):e1002638.
  • [32]Medina-Rivera A, Abreu-Goodger C, Thomas-Chollier M, Salgado H, Collado-Vides J, Van Helden J: Theoretical and empirical quality assessment of transcription factor-binding motifs. Nucleic Acids Res 2011, 39(3):808-824.
  • [33]Johansson O, Alkema W, Wasserman WW, Lagergren J: Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics 2003, 19(Suppl 1):i169-i176.
  • [34]Zhao Y, Ruan S, Pandey M, Stormo GD: Improved models for transcription factor binding site identification using nonindependent interactions. Genetics 2012, 191(3):781-790.
  • [35]Gordan R, Hartemink AJ, Bulyk ML: Distinguishing direct versus indirect transcription factor-DNA interactions. Genome Res 2009, 19(11):2090-2100.
  • [36]Park D, Lee Y, Bhupindersingh G, Iyer VR: Widespread misinterpretable ChIP-seq bias in yeast. PLoS ONE 2013, 8(12):e83506.
  • [37]Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, Loh YH, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B, Shahab A, Ruan Y, Bourque G, Sung WK, Clarke ND, Wei CL, Ng HH: Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 2008, 133(6):1106-1117.
  • [38]Tiwari VK, Stadler MB, Wirbelauer C, Paro R, Schubeler D, Beisel C: A chromatin-modifying function of JNK during stem cell differentiation. Nat Genet 2012, 44(1):94-100.
  • [39]Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462(7271):315-322.
  • [40]Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P, Odom DT: Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 2010, 328(5981):1036-1040.
  • [41]Hoffman BG, Robertson G, Zavaglia B, Beach M, Cullum R, Lee S, Soukhatcheva G, Li L, Wederell ED, Thiessen N, Bilenky M, Cezard T, Tam A, Kamoh B, Birol I, Dai D, Zhao Y, Hirst M, Verchere CB, Helgason CD, Marra MA, Jones SJ, Hoodless PA: Locus co-occupancy, nucleosome positioning, and H3K4me1 regulate the functionality of FOXA2-, HNF4A-, and PDX1-bound loci in islets and liver. Genome Res 2010, 20(8):1037-1051.
  • [42]Consortium EP, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M: An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489(7414):57-74.
  • [43]Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, Kuhn RM, Haussler D, Kent WJ: ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 2013, 41(Database issue):D56-D63.
  • [44]Fejes AP, Robertson G, Bilenky M, Varhol R, Bainbridge M, Jones SJ: FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 2008, 24(15):1729-1730.
  • [45]Kuhn RM, Haussler D, Kent WJ: The UCSC genome browser and associated tools. Brief Bioinform 2013, 14(2):144-161.
  • [46]Lenhard B, Wasserman WW: TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics 2002, 18(8):1135-1136.
  • [47]Marstrand TT, Frellsen J, Moltke I, Thiim M, Valen E, Retelska D, Krogh A: Asap: a framework for over-representation statistics for transcription factor binding sites. PLoS ONE 2008, 3(2):e1623.
  • [48]Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25(11):1422-1423.
  • [49]Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004, 32(Database issue):D493-D496.
  文献评价指标  
  下载次数:0次 浏览次数:4次