The 2003 SIAM(Society for Industrial and Applied Mathematics) International Conference on Data Mining | |
Scalable, Balanced Model-based Clustering | |
Shi Zhong ; Joydeep Ghosh | |
Others : http://www.siam.org/proceedings/datamining/2003/dm03_07ZhongS.pdf PID : 19054 |
|
来源: CEUR | |
【 摘 要 】
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process- iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balance-constrained approach is used for the sample assignment step. An efficient iterative bipartitioning heuristic is developed to reduce the computational complexity of this step and make the balanced sample assignment algorithm scalable to large datasets. We demonstrate the superiority of this approach to regular ML clustering on arbitrary-shape 2-D spatial data, high-dimensional text documents, and EEG time series.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Scalable, Balanced Model-based Clustering | 917KB | download |