期刊论文

【摘要】

The K-means clustering algorithm is well-known for its easy computational approach. In this algorithm, essential cluster-level information is captured by the K cluster centroids. However, how many such cen-troids can reveal the structure of the underlying data depends upon the choice of K . In this paper, we propose a clustering algorithm in which the number of cluster K can be learned as well as it performs the clustering. Our work revolves around two observations: i) a large-sized random sampled dataset may have a similar distribution as the original data, and ii) for the true number of clusters their centroids, generated from a sampled datasets, approximate the cluster centroids generated from the original dataset. The first observation has paved the way to provide a scalable solution, and the second one forms the key aspect of building the proposed algorithm. We have tested our method on several real and synthetic datasets. Our method can solve a few pertinent issues of clustering a dataset: 1) detection of a single cluster in the absence of any other cluster in a dataset, 2) the presence of hierarchy, 3) clustering of a high dimensional dataset, 4) robustness over dataset having cluster imbalance, and 5) robustness to noise. We have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset. (c) 2020 Elsevier Ltd. All rights reserved.

【授权许可】

Free

【预览】

附件列表
Files	Size	Format	View
10_1016_j_patcog_2020_107625.pdf	3459KB	PDF	download

PATTERN RECOGNITION	卷:110
CNAK: Cluster number assisted K-means
Article
Saha, Jayasree¹ Mukherjee, Jayanta¹
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词: K-means clustering; Bipartite graph; Perfect matching; Kuhn-Munkres algorithm; Stability;
DOI : 10.1016/j.patcog.2020.107625
来源: Elsevier
PDF


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】