科技报告详细信息
Dependence of Clustering Algorithm Performance on Clustered-ness of Data
Zhang, Bin
HP Development Company
关键词: clustering;    K-Means;    K-Harmonic Means;    EM;    Data Mining;   
RP-ID  :  HPL-2001-91
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Intuitively, clustering algorithms should work better on the datasets that have well separated clusters. But we found the contrary for the center-based clustering algorithms, including K-Means, K-Harmonic Means and EM. We generated 1200 synthetic datasets with varying ratio of inter-cluster variance over within-cluster variance, which we call the clustered-ness of the dataset. We run K-Means, K-Harmonic Means and EM on these datasets and found that the ratio of the performance over the global optimum grows with increasing clustered-ness. Dependence of clustering algorithm performance on other parameters -- quality of initialization and dimensionality of data -- are also demonstrated. 12 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100002273LZ 431KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:45次