Genomics | |
Applying MSSIM combined chaos game representation to genome sequences analysis | |
Hai ming Ni^11  Da wei Qi^22  | |
[1] College of Mechanical and Electrical engineering, Northeast Forestry University, Hexing Road 26, Harbin, Heilongjiang Province 150040, PR China^1;College of Science, Northeast Forestry University, Hexing Road 26, Harbin, Heilongjiang Province 150040, PR China^2 | |
关键词: Chaos game representation; Structural similarity; Hierarchical clustering analysis; Genome sequences; | |
DOI : 10.1016/j.ygeno.2017.09.010 | |
学科分类:医学(综合) | |
来源: Academic Press | |
【 摘 要 】
Converting DNA sequence to image by using chaos game representation (CGR) is an effective genome sequence pretreatment technology, which provides the basis for further analysis between the different genes. In this paper, we have constructed 10 mammal species, 48 hepatitis E virus (HEV), and 10 kinds of bacteria genetic CGR images, respectively, to calculate the mean structural similarity (MSSIM) coefficient between every two CGR images. From our analysis, the MSSIM coefficient of gene CGR images can accurately reflect the similarity degrees between different genomes. Hierarchical clustering analysis was used to calculate the class affiliation and construct a dendrogram. Large numbers of experiments showed that this method gives comparable results to the traditional Clustal X phylogenetic tree construction method, and is significantly faster in the clustering analysis process. Meanwhile MSSIM combined CGR method was also able to efficiently clustering of large genome sequences, which the traditional multiple sequence alignment methods (e.g. Clustal X, Clustal Omega, Clustal W, et al.) cannot classify.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201910181654248ZK.pdf | 3686KB | download |