Journal of Computer Science | |
Predicting Missing Attribute Values Using k-Means Clustering | Science Publications | |
Keppana G. Thanushkodi1  Nambiraj Suguna1  | |
关键词: Bees Colony Optimization (BCO); K-Nearest Neighbor (KNN); missing attributes; Most Common Attribute Value (MCAV); Event-Covering Method (EC); genetic algorithm; k-means clustering; clustering algorithm; onlooker bee; Artificial Bee Colony (ABC); | |
DOI : 10.3844/jcssp.2011.216.224 | |
学科分类:计算机科学(综合) | |
来源: Science Publications | |
【 摘 要 】
Problem statement: Predicting the value for missing attributes is an important datapreprocessing problem in data mining and knowledge discovery tasks. Several methods have beenproposed to treat missing data and the one used more frequently is deleting instances containing atleast one missing value of a feature. When the dataset has minimum number of missing attribute valuesthen we can neglect the instances. But if it is high, deleting those instances may neglect the essentialinformation. Some methods, such as assigning an average value to the missing attribute, assigning themost common values make good use of all the available data. However the assigned value may notcome from the information which the data originally derived from, thus noise is brought to the data.Approach: In this study, k-means clustering is proposed for predicting missing attribute values. Theperformance of the proposed approach is analyzed with nine different methods. The overall analysisshows that the k-means clustering can predict the missing attribute values better than other methods.After assigning the missing attributes, the feature selection is performed with Bees ColonyOptimization (BCO) and the improved Genetic KNN is applied for finding the classificationperformance as discussed in our previous study. Results: The performance is analyzed with fourdifferent medical datasets; Dermatology, Cleveland Heart, Lung Cancer and Wisconsin. For all thedatasets, the proposed k-means based missing attribute prediction achieves higher accuracy of 94.60%, 90.45 %, 87.51 % and 95.70 % respectively. Conclusion: The greater classification accuracy showsthe superior performance of the k-means based missing attribute value prediction.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201911300566857ZK.pdf | 124KB | download |