Symmetry | |
Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders | |
Yi Yang1  Baobin Duan1  Lixin Han1  Zhinan Gou1  Shuangshuang Chen2  | |
[1] College of Computer and Information, Hohai University, Nanjing 211100, China;Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science, Yancheng Teachers University, Yancheng 224002, China; | |
关键词: clustering; mixed data; density peaks; stacked denoising autoencoders; | |
DOI : 10.3390/sym11020163 | |
来源: DOAJ |
【 摘 要 】
With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.
【 授权许可】
Unknown