学位论文详细信息
Mitigating Spark straggler tasks for iterative applications by data re-partitioning
Straggler;Machine learning;Apache Spark;Iterative application
Teng, Bo ; Campbell ; Roy H.
关键词: Straggler;    Machine learning;    Apache Spark;    Iterative application;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/97707/TENG-THESIS-2017.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Many of the data science applications nowadays feature large datasets and short tasks that run many iterations. When running these applications on a parallel processing framework like Apache Spark, one problem that affects the running time is the straggler, where a disproportionate long-running task slows down the entire cluster. In this work we present a straggler mitigation technique tailored for applications that run small tasks for many iterations over a large dataset, and implemented the algorithm in Apache Spark. We monitor the resources available on each Spark node, and dynamically re partition the dataset proportional to the estimated resource available. We have shown that our algorithm has negligible overhead for resource monitoring, and can improve the performance of Spark cluster significantly when stragglers are present.

【 预 览 】
附件列表
Files Size Format View
Mitigating Spark straggler tasks for iterative applications by data re-partitioning 474KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:10次