期刊论文

【摘要】

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. (C) 2008 Elsevier Ltd. All rights reserved.

【授权许可】

Free

【预览】

附件列表
Files	Size	Format	View
10_1016_j_patcog_2008_09_027.pdf	557KB	PDF	download

PATTERN RECOGNITION	卷:42
A scalable framework for cluster ensembles
Article
Hore, Prodip¹ Hall, Lawrence O.¹ Goldgof, Dmitry B.¹
[1] Univ S Florida, Dept Comp Sci & Engn, ENB 118, Tampa, FL 33620 USA
关键词: Clustering; Hard/fuzzy-k-means; Large data sets; Ensemble; Scalability; Single pass algorithm;
DOI : 10.1016/j.patcog.2008.09.027
来源: Elsevier
PDF


	文献评价指标
	下载次数：8次	浏览次数：2次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】