期刊论文详细信息
BMC Bioinformatics
HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data
Huihuang Yan1  Jared Evans2  Mike Kalmbach2  Raymond Moore2  Sumit Middha2  Stanislav Luban3  Liguo Wang2  Aditya Bhagwate2  Ying Li2  Zhifu Sun2  Xianfeng Chen2  Jean-Pierre A Kocher2 
[1] Epigenomics Translational Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
[2] Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA
[3] Current address: Interdisciplinary Bioinformatics and Systems Biology Program, University of California at San Diego, La Jolla, CA 92093-0419, USA
关键词: Irreproducible discovery rate;    Duplicate filtering;    Peak calling;    Next-generation sequencing;    ChIP-Seq;   
Others  :  1086442
DOI  :  10.1186/1471-2105-15-280
 received in 2014-03-18, accepted in 2014-08-11,  发布年份 2014
PDF
【 摘 要 】

Background

Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (ChIP-Seq) has been widely used to identify genomic loci of transcription factor (TF) binding and histone modifications. ChIP-Seq data analysis involves multiple steps from read mapping and peak calling to data integration and interpretation. It remains challenging and time-consuming to process large amounts of ChIP-Seq data derived from different antibodies or experimental designs using the same approach. To address this challenge, there is a need for a comprehensive analysis pipeline with flexible settings to accelerate the utilization of this powerful technology in epigenetics research.

Results

We have developed a highly integrative pipeline, termed HiChIP for systematic analysis of ChIP-Seq data. HiChIP incorporates several open source software packages selected based on internal assessments and published comparisons. It also includes a set of tools developed in-house. This workflow enables the analysis of both paired-end and single-end ChIP-Seq reads, with or without replicates for the characterization and annotation of both punctate and diffuse binding sites. The main functionality of HiChIP includes: (a) read quality checking; (b) read mapping and filtering; (c) peak calling and peak consistency analysis; and (d) result visualization. In addition, this pipeline contains modules for generating binding profiles over selected genomic features, de novo motif finding from transcription factor (TF) binding sites and functional annotation of peak associated genes.

Conclusions

HiChIP is a comprehensive analysis pipeline that can be configured to analyze ChIP-Seq data derived from varying antibodies and experiment designs. Using public ChIP-Seq data we demonstrate that HiChIP is a fast and reliable pipeline for processing large amounts of ChIP-Seq data.

【 授权许可】

   
2014 Yan et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150116011928425.pdf 1300KB PDF download
Figure 6. 102KB Image download
Figure 5. 48KB Image download
Figure 4. 102KB Image download
Figure 3. 71KB Image download
Figure 2. 39KB Image download
Figure 1. 76KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Furey TS: ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 2012, 13(12):840-852.
  • [2]Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, et al.: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 2012, 22(9):1813-1831.
  • [3]Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seq studies. Nat Methods 2009, 6(11 Suppl):S22-S32.
  • [4]Wilbanks EG, Facciotti MT: Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 2010, 5(7):e11471.
  • [5]Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, Zhang Y, Kim TK, He HH, Zieba J, Ruan Y, Bickel PJ, Myers RM, Wold BJ, White KP, Lieb JD, Liu XS: Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 2012, 9(6):609-614.
  • [6]Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W: A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics 2009, 25(15):1952-1958.
  • [7]Wang J, Lunyak VV, Jordan IK: BroadPeak: a novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets. Bioinformatics 2013, 29(4):492-493.
  • [8]Song Q, Smith AD: Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics (Oxford, England) 2011, 27(6):870-871.
  • [9]Kumar V, Muratani M, Rayan NA, Kraus P, Lufkin T, Ng HH, Prabhakar S: Uniform, optimal signal processing of mapped deep-sequencing data. Nat Biotechnol 2013, 31(7):615-622.
  • [10]Li Q, Brown JB, Huang H, Bickel PJ: Measuring reproducibility of high-throughput experiments. Ann Appl Stat 2011, 5(3):1752-1779.
  • [11]McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G: GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 2010, 28(5):495-501.
  • [12]Shin H, Liu T, Manrai AK, Liu XS: CEAS: cis-regulatory element annotation system. Bioinformatics 2009, 25(19):2605-2606.
  • [13]Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I, Tora L: seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res 2011, 39(6):e35.
  • [14]Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS, Green MR: ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237.
  • [15]Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 2009, 37(Web Server issue):W202-W208.
  • [16]Bardet AF, He Q, Zeitlinger J, Stark A: A computational pipeline for comparative ChIP-seq analyses. Nat Protoc 2012, 7(1):45-61.
  • [17]Boeva V, Lermine A, Barette C, Guillouf C, Barillot E: Nebula–a web-server for advanced ChIP-seq data analysis. Bioinformatics 2012, 28(19):2517-2519.
  • [18]Mercier E, Droit A, Li L, Robertson G, Zhang X, Gottardo R: An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PLoS One 2011, 6(2):e16432.
  • [19]Barozzi I, Termanini A, Minucci S, Natoli G: Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data. Biol Direct 2011, 6:51.
  • [20]Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS: Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008, 9(9):R137.
  • [21]Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, Ali S, Chin SF, Palmieri C, Caldas C, Carroll JS: Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 2012, 481(7381):389-393.
  • [22]Ptasinska A, Assi SA, Mannari D, James SR, Williamson D, Dunne J, Hoogenkamp M, Wu M, Care M, McNeill H, Cauchy P, Cullen M, Tooze RM, Tenen DG, Young BD, Cockerill PN, Westhead DR, Heidenreich O, Bonifer C: Depletion of RUNX1/ETO in t(8;21) AML cells leads to genome-wide changes in chromatin structure and transcription factor binding. Leukemia 2012, 26(8):1829-1841.
  • [23]Lunter G, Goodson M: Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 2011, 21(6):936-939.
  • [24]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25(14):1754-1760.
  • [25]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/Map format and SAMtools. Bioinformatics 2009, 25(16):2078-2079.
  • [26]Chung D, Kuan PF, Li B, Sanalkumar R, Liang K, Bresnick EH, Dewey C, Keles S: Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data. PLoS Comput Biol 2011, 7(7):e1002111.
  • [27]Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol 2011, 29(1):24-26.
  • [28]Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26(6):841-842.
  • [29]Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF, Wang Q, Bekiranov S, Sementchenko V, Fox EA, Silver PA, Gingeras TR, Liu XS, Brown M: Genome-wide analysis of estrogen receptor binding sites. Nat Genet 2006, 38(11):1289-1297.
  文献评价指标  
  下载次数:31次 浏览次数:20次