| BMC Bioinformatics | |
| diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data | |
| Aaron T.L. Lun2  Gordon K. Smyth1  | |
| [1] Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia | |
| [2] Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Melbourne, Australia | |
| 关键词: Differential analysis; Genomic interaction; Hi-C; | |
| Others : 1229842 DOI : 10.1186/s12859-015-0683-0 |
|
| received in 2015-05-11, accepted in 2015-07-22, 发布年份 2015 | |
【 摘 要 】
Background
Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results.
Results
Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data.
Conclusions
On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.
【 授权许可】
2015 Lun and Smyth.
| Files | Size | Format | View |
|---|---|---|---|
| Fig. 7. | 17KB | Image | |
| Fig. 6. | 122KB | Image | |
| Fig. 5. | 15KB | Image | |
| Fig. 4. | 27KB | Image | |
| Fig. 3. | 92KB | Image | |
| Fig. 2. | 49KB | Image | |
| Fig. 1. | 70KB | Image | |
| Fig. 7. | 17KB | Image | |
| Fig. 6. | 122KB | Image | |
| Fig. 5. | 15KB | Image | |
| Fig. 4. | 27KB | Image | |
| Fig. 3. | 92KB | Image | |
| Fig. 2. | 49KB | Image | |
| Fig. 1. | 70KB | Image |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
【 参考文献 】
- [1]Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A et al.. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326(5950):289-93.
- [2]Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR et al.. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods. 2012; 9(10):999-1003.
- [3]Yaffe E, Tanay A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011; 43(11):1059-65.
- [4]Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y et al.. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485(7398):376-80.
- [5]Rickman DS, Soong TD, Moss B, Mosquera JM, Dlabal J, Terry S et al.. Oncogene-mediated alterations in chromatin conformation. Proc Natl Acad Sci U S A. 2012; 109(23):9083-088.
- [6]Seitan VC, Faure AJ, Zhan Y, McCord RP, Lajoie BR, Ing-Simmons E et al.. Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments. Genome Res. 2013; 23(12):2066-077.
- [7]Paulsen J, Sandve GK, Gundersen S, Lien TG, Trengereid K, Hovig E. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. Bioinformatics. 2014; 30(11):1620-22.
- [8]Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P et al.. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010; 38(4):576-89.
- [9]Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139-40.
- [10]McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor SRNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012; 40(10):4288-97.
- [11]Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012; 11(5):Article 8.
- [12]Lun ATL, Chen Y, Smyth GK. It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Technical report, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne. 2015. http://www. statsci.org/smyth/pubs/QLedgeRPreprint.pdf webcite
- [13]Huber W, Carey VJ, Gentleman R, Anders S, Carlson M. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015; 12(2):115-21.
- [14]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al.. The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9.
- [15]Fischer B, Pau G. Rhdf5: HDF5 interface to R. 2015. R package version 2.12.0. http://www. bioconductor.org/packages/release/bioc/html/rhdf5.html webcite
- [16]Sofueva S, Yaffe E, Chan WC, Georgopoulou D, Vietri Rudan M, Mira-Bontenbal H et al.. Cohesin-mediated interactions organize chromosomal domain architecture. EMBO J. 2013; 32(24):3119-29.
- [17]Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011; 17(1):10-12.
- [18]Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9(4):357-9.
- [19]Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY et al.. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature. 2013; 503(7475):290-4.
- [20]Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012; 58(3):268-76.
- [21]Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT et al.. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159(7):1665-80.
- [22]Lun AT, Smyth GK. De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly. Nucleic Acids Res. 2014; 42:95.
- [23]Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A. 2010; 107(21):9546-51.
- [24]Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013; 14:67-84.
- [25]Lin YC, Benner C, Mansson R, Heinz S, Miyazaki K, Miyazaki M et al.. Global changes in the nuclear positioning of genes and intra- and interdomain genomic interactions that orchestrate B cell fate. Nat Immunol. 2012; 13(12):1196-204.
- [26]Lun AT, Smyth GK. csaw: detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control. R package version 1.2.1.http://bioconductor.org/packages/release/bioc/html/csaw.html.
- [27]Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010; 11(3):25. BioMed Central Full Text
- [28]Loader C. Locfit: local regression, likelihood and density estimation. 2013. R package version 1.5-9.1. http://CRAN. R-project.org/package=locfit webcite
- [29]Chen Y, Lun ATL, Smyth GK. Differential expression analysis of complex RNA-seq experiments using edgeR. Statistical analysis of next generation sequence data. Datta S, Nettleton DS, editors. Springer, New York; 2014.
- [30]Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Empirical Bayes in the presence of exceptional cases, with application to microarray data. Technical report, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; 2015. http://www. statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
- [31]Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008; 18(9):1509-17.
- [32]Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B. 1995; 57:289-300.
- [33]Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986; 73(3):751-4.
- [34]Ma W, Ay F, Lee C, Gulsoy G, Deng X, Cook S et al.. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods. 2015; 12(1):71-8.
- [35]Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet. 2014; 46(2):205-12.