BMC Bioinformatics | |
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data | |
Research Article | |
Oluwatosin Oluwadare1  Jianlin Cheng2  | |
[1] Electrical Engineering and Computer Science Department, University of Missouri, 65211, Columbia, MO, USA;Electrical Engineering and Computer Science Department, University of Missouri, 65211, Columbia, MO, USA;Informatics Institute, University of Missouri, 65211, Columbia, MO, USA; | |
关键词: Clustering; Hi-C; Topologically associated domain (TAD); CTCF; Chromosome conformation capturing; Genome structure; Chromosome organization; | |
DOI : 10.1186/s12859-017-1931-2 | |
received in 2017-07-14, accepted in 2017-11-06, 发布年份 2017 | |
来源: Springer | |
【 摘 要 】
BackgroundWith the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function.ResultsHere, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications.ConclusionsAs ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD.
【 授权许可】
CC BY
© The Author(s). 2017
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311102349646ZK.pdf | 3773KB | download | |
MediaObjects/13395_2023_326_MOESM1_ESM.docx | 300KB | Other | download |
12951_2015_155_Article_IEq39.gif | 1KB | Image | download |
【 图 表 】
12951_2015_155_Article_IEq39.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]