期刊论文

【摘要】

Problem statement: The k-means method is one of the most widely used clustering techniques for various applications. However, the k-means often converges to local optimum and the result depends on the initial seeds. Inappropriate choice of initial seeds may yield poor results. k-means++ is a way of initializing k-means by choosing initial seeds with specific probabilities. Due to the random selection of first seed and the minimum probable distance, the k-means++ also results different clusters in different runs in different number of iterations. Approach: In this study we proposed a method called Single Pass Seed Selection (SPSS) algorithm as modification to k-means++ to initialize first seed and probable distance for k-means++ based on the point which was close to more number of other points in the data set. Result: We evaluated its performance by applying on various datasets and compare with k-means++. The SPSS algorithm was a single pass algorithm yielding unique solution in less number of iterations when compared to k-means++. Experimental results on real data sets (4-60 dimensions, 27-10945 objects and 2-10 clusters) from UCI demonstrated the effectiveness of the SPSS in producing consistent clustering results. Conclusion: SPSS performed well on high dimensional data sets. Its efficiency increased with the increase of features in the data set; particularly when number of features greater than 10 we suggested the proposed method.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO201911300066817ZK.pdf	62KB	PDF	download

Journal of Computer Science
Single Pass Seed Selection Algorithm for k-Means \| Science Publications

G. R. Sridhar¹ A. V.D. Rao¹ K. K. Pavan¹ Allam A. Rao¹
关键词: Clustering; k-means; k-means++; local optimum; minimum probable distance; SPSS;
DOI : 10.3844/jcssp.2010.60.66
学科分类：计算机科学（综合）
来源: Science Publications
PDF


	文献评价指标
	下载次数：18次	浏览次数：7次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】