科技报告详细信息
Tech Report: HPL-1999-124: K-Harmonic Means - A
Zhang, Bin ; Hsu, Meichun ; Dayal, Umeshwar
HP Development Company
关键词: clustering;    K-Means;    K-Harmonic Means;    data mining;   
RP-ID  :  HPL-1999-124
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of data clustering is the total within-cluster variance, or the total mean-square quantization error (MSE). The K-Means (KM) algorithm is a popular algorithm which attempts to find a K- clustering which minimizes MSE. The K-Means algorithm is a centerbased clustering algorithm. The dependency of the K-Means performance on the initialization of the centers is a major problem; a similiar issue exists for an alternative algorithm, Expectation Maximization(EM), although to a lesser extent. In this paper, we propose a new clustering method called the K-Harmonic Means algorithm (KHM). KHM is a center- based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function. It is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers. In certain cases, K-Harmonic Means significantly improves the quality of clustering results comparing with both K-Means and EM, which are the two most popular clustering algorithms used in data exploration and data compression. A unified view of the three performance functions, K-Means', K-Harmonic Means' and EM's, are given for comparison. Experimental results of KHM comparing with KM on high dimensional data and visualization of the animation of the convergence of all three algorithms using 2-dimensional data are given. 25 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100001847LZ 4238KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:24次