PATTERN RECOGNITION | 卷:110 |
CNAK: Cluster number assisted K-means | |
Article | |
Saha, Jayasree1  Mukherjee, Jayanta1  | |
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India | |
关键词: K-means clustering; Bipartite graph; Perfect matching; Kuhn-Munkres algorithm; Stability; | |
DOI : 10.1016/j.patcog.2020.107625 | |
来源: Elsevier | |
【 摘 要 】
The K-means clustering algorithm is well-known for its easy computational approach. In this algorithm, essential cluster-level information is captured by the K cluster centroids. However, how many such cen-troids can reveal the structure of the underlying data depends upon the choice of K . In this paper, we propose a clustering algorithm in which the number of cluster K can be learned as well as it performs the clustering. Our work revolves around two observations: i) a large-sized random sampled dataset may have a similar distribution as the original data, and ii) for the true number of clusters their centroids, generated from a sampled datasets, approximate the cluster centroids generated from the original dataset. The first observation has paved the way to provide a scalable solution, and the second one forms the key aspect of building the proposed algorithm. We have tested our method on several real and synthetic datasets. Our method can solve a few pertinent issues of clustering a dataset: 1) detection of a single cluster in the absence of any other cluster in a dataset, 2) the presence of hierarchy, 3) clustering of a high dimensional dataset, 4) robustness over dataset having cluster imbalance, and 5) robustness to noise. We have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset. (c) 2020 Elsevier Ltd. All rights reserved.
【 授权许可】
Free
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
10_1016_j_patcog_2020_107625.pdf | 3459KB | download |