会议论文详细信息
The 2003 SIAM(Society for Industrial and Applied Mathematics) International Conference on Data Mining
Scalable, Balanced Model-based Clustering
Shi Zhong ; Joydeep Ghosh
Others  :  http://www.siam.org/proceedings/datamining/2003/dm03_07ZhongS.pdf
PID  :  19054
来源: CEUR
PDF
【 摘 要 】

This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process- iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balance-constrained approach is used for the sample assignment step. An efficient iterative bipartitioning heuristic is developed to reduce the computational complexity of this step and make the balanced sample assignment algorithm scalable to large datasets. We demonstrate the superiority of this approach to regular ML clustering on arbitrary-shape 2-D spatial data, high-dimensional text documents, and EEG time series.

【 预 览 】
附件列表
Files Size Format View
Scalable, Balanced Model-based Clustering 917KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:20次