会议论文详细信息
7th Workshop on Large-Scale Distributed Systems for Information Retrieval
The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce
Jimmy Lin
Others  :  http://CEUR-WS.org/Vol-480/paper3.pdf
PID  :  11475
来源: CEUR
PDF
【 摘 要 】

This paper explores the problem of “stragglers" in Map- Reduce: a common phenomenon where a small number of mappers or reducers takes significantly longer than the others to complete. The effects of these stragglers include unnecessarily long wall-clock running times and sub-optimal cluster utilization. In many cases, this problem cannot simply be attributed to hardware idiosyncrasies, but is rather caused by the Zipfian distribution of input or intermediate data. I present a simple theoretical model that shows how such distributions impose a fundamental limit on the amount of parallelism that can be extracted from a large class of algorithms where all occurrences of the same element must be processed together. A case study in parallel ad hoc query evaluation highlights the severity of the stragglers problem. Fortunately, a simple modification of the input data cuts end-to-end running time in half. This example illustrates some of the issues associated with designing effcient MapReduce algorithms for real-world datasets.

【 预 览 】
附件列表
Files Size Format View
The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce 222KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:4次