会议论文详细信息
2017 2nd International Seminar on Advances in Materials Science and Engineering
A strategy to load balancing for non-connectivity MapReduce job
Zhou, Huaping^1 ; Liu, Guangzong^1 ; Gui, Haixia^1
College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan Anhui
232000, China^1
关键词: Complex datasets;    Data distribution;    Data partitioning;    Data skew;    Distributed programming model;    Map-reduce;    Real data sets;    Sampling method;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/231/1/012038/pdf
DOI  :  10.1088/1757-899X/231/1/012038
来源: IOP
PDF
【 摘 要 】

MapReduce has been widely used in large scale and complex datasets as a kind of distributed programming model. Original Hash partitioning function in MapReduce often results the problem of data skew when data distribution is uneven. To solve the imbalance of data partitioning, we proposes a strategy to change the remaining partitioning index when data is skewed. In Map phase, we count the amount of data which will be distributed to each reducer, then Job Tracker monitor the global partitioning information and dynamically modify the original partitioning function according to the data skew model, so the Partitioner can change the index of these partitioning which will cause data skew to the other reducer that has less load in the next partitioning process, and can eventually balance the load of each node. Finally, we experimentally compare our method with existing methods on both synthetic and real datasets, the experimental results show our strategy can solve the problem of data skew with better stability and efficiency than Hash method and Sampling method for non-connectivity MapReduce task.

【 预 览 】
附件列表
Files Size Format View
A strategy to load balancing for non-connectivity MapReduce job 376KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:30次