Journal of Computer Science | |
A Cluster Feature-Based Incremental Clustering Approach to Mixed Data | Science Publications | |
A. M. Sowjanya1  M. Shashi1  | |
关键词: Data mining; cluster feature; centroid; farthest neighbor points; mixed attributes; numerical attributes; categorical attributes; incremental clustering; k-means; | |
DOI : 10.3844/jcssp.2011.1875.1880 | |
学科分类:计算机科学(综合) | |
来源: Science Publications | |
【 摘 要 】
Problem statement: The main objective of this study is to develop an incremental clustering algorithm that can handle numerical as well as categorical attributes in a given dataset. The authors have previously reported a cluster feature-based algorithm, CFICA that can handle only numerical data. Appraoch: Since many of the real life data mining applications work with datasets that contain both numeric and categorical attributes, there is a need for modifying the earlier algorithm to handle such mixed datasets. The core idea is to propose a new distance measure based on the weight age which is automatically generated and apply it to incremental clustering algorithms. The incremental data points are handled in two phases. In the first phase, k-means clustering algorithm is employed for initial clustering of the static databse.In the second phase, the designed distance measure is used to generate the appropriate cluster for the incremental data points. The combination of the two has proved to be more effective in handling mixed datasets. Clustering accuracy, clustering error and the computational time of the proposed approach have been evaluated with different k values and the thresholds. Variation of threshold values showed better results in terms of accuracy for different datasets. Results: The clustering error in this approach reduced considerably with different k values and thresholds. Conclusion: The results ensure the efficiency of the proposed approach in handling real mixed datasets composed of numerical and categorical attributes only.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201911300042238ZK.pdf | 315KB | download |