2018 3rd International Conference on Insulating Materials, Material Application and Electrical Engineering | |
Research on Efficient K_Means Parallel Algorithm Based on Hadoop Distributed Architecture | |
材料科学;无线电电子学;电工学 | |
Qian, Lin^1 ; Wang, Lin^1 ; Mei, Zhu^1 ; Yu, Jun^1 ; Zhu, Guangxin^1 ; Song, Debing^1 ; Xu, Mingjie^1 | |
State Grid Electric Power Research Institute (SGEPRI), Nanjing, China^1 | |
关键词: Clustering accuracy; Distributed architecture; Improve performance; Mapreduce frameworks; Replacement policy; Sample pretreatment; Sampling efficiency; Slow convergences; | |
Others : https://iopscience.iop.org/article/10.1088/1757-899X/452/4/042066/pdf DOI : 10.1088/1757-899X/452/4/042066 |
|
学科分类:材料科学(综合) | |
来源: IOP | |
【 摘 要 】
Focusing on the problems of K-means algorithm that has high time complexity, slow convergence, lower clustering accuracy, slow operating speed, an efficient K-means parallel algorithm based on Hadoop system and MapReduce framework is proposed. Firstly, the algorithm uses K selective sorting algorithm to improve the sampling efficiency; Secondly, the iterative center is updated by using the weight replacement policy; finally, the initial center point is obtained based on the sample pretreatment strategy. Experimental results show that the proposed algorithm not only has good convergence, accuracy and speedup, but also can improve performance of the algorithm.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Research on Efficient K_Means Parallel Algorithm Based on Hadoop Distributed Architecture | 132KB | download |