期刊论文详细信息
BMC Bioinformatics
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data
Research Article
Oluwatosin Oluwadare1  Jianlin Cheng2 
[1] Electrical Engineering and Computer Science Department, University of Missouri, 65211, Columbia, MO, USA;Electrical Engineering and Computer Science Department, University of Missouri, 65211, Columbia, MO, USA;Informatics Institute, University of Missouri, 65211, Columbia, MO, USA;
关键词: Clustering;    Hi-C;    Topologically associated domain (TAD);    CTCF;    Chromosome conformation capturing;    Genome structure;    Chromosome organization;   
DOI  :  10.1186/s12859-017-1931-2
 received in 2017-07-14, accepted in 2017-11-06,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundWith the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function.ResultsHere, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications.ConclusionsAs ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD.

【 授权许可】

CC BY   
© The Author(s). 2017

【 预 览 】
附件列表
Files Size Format View
RO202311094607698ZK.pdf 3773KB PDF download
12864_2017_4186_Article_IEq28.gif 1KB Image download
12864_2017_3771_Article_IEq3.gif 1KB Image download
【 图 表 】

12864_2017_3771_Article_IEq3.gif

12864_2017_4186_Article_IEq28.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  文献评价指标  
  下载次数:2次 浏览次数:0次