会议论文

【摘要】

MapReduce has been widely used in large scale and complex datasets as a kind of distributed programming model. Original Hash partitioning function in MapReduce often results the problem of data skew when data distribution is uneven. To solve the imbalance of data partitioning, we proposes a strategy to change the remaining partitioning index when data is skewed. In Map phase, we count the amount of data which will be distributed to each reducer, then Job Tracker monitor the global partitioning information and dynamically modify the original partitioning function according to the data skew model, so the Partitioner can change the index of these partitioning which will cause data skew to the other reducer that has less load in the next partitioning process, and can eventually balance the load of each node. Finally, we experimentally compare our method with existing methods on both synthetic and real datasets, the experimental results show our strategy can solve the problem of data skew with better stability and efficiency than Hash method and Sampling method for non-connectivity MapReduce task.

【预览】

附件列表
Files	Size	Format	View
A strategy to load balancing for non-connectivity MapReduce job	376KB	PDF	download

2017 2nd International Seminar on Advances in Materials Science and Engineering
A strategy to load balancing for non-connectivity MapReduce job

Zhou, Huaping^1 ; Liu, Guangzong^1 ; Gui, Haixia^1
College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan Anhui
232000, China^1
关键词: Complex datasets; Data distribution; Data partitioning; Data skew; Distributed programming model; Map-reduce; Real data sets; Sampling method;
Others : https://iopscience.iop.org/article/10.1088/1757-899X/231/1/012038/pdf DOI : 10.1088/1757-899X/231/1/012038

来源: IOP
PDF


	文献评价指标
	下载次数：12次	浏览次数：31次

【 摘 要 】

【 预 览 】

【摘要】

【预览】