科技报告详细信息
A Local Search Approach to K-Clustering
Zhang, Bin ; Kleyner, Gary ; Hsu, Meichun
HP Development Company
关键词: local search algorithm;    data clustering;    K-means;    clustering aggregated data;    clustering large data sets;    data compression;    vector quantization;   
RP-ID  :  HPL-1999-119
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of the K-clustering is the total within-cluster variance, or the total mean-square quantization error (MSE). The K-Means (KM) algorithm is a popular algorithm which attempts to find a K- clustering which minimizes MSE. In this paper, we approach the min-MSE clustering problem by way of a Local Search (LS) algorithm, and analytically derive a clustering algorithm which we call LKM. A number of analyses of LKM are given; in particular, we prove that the set of local optima that can trap KM is a superset of those that can trap LKM. The experimental results also show that LKM converges faster and better than KM. More importantly, LKM naturally extends to an aggregated version, called A-LKM, which can be applied to the problem of clustering large data sets. A-LKM is a clustering algorithm which clusters subsets of data points, or subclusters, instead of individual data points. It can be used to cluster, for example, a large data set that has been aggregated through an algorithm such as the Phase 1 of the BIRCH algorithm ([ZRL96]), with the intention of fitting the aggregated data into the main memory to enable main memory-based clustering. We prove that A-LKM, as applied to the problem of clustering subclusters, preserve the monotone convergence property. Experimental results also show that A-LKM performs better than A-KM, in clustering aggregated data. 20 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100001860LZ 146KB PDF download
  文献评价指标  
  下载次数:43次 浏览次数:31次