Many of the data science applications nowadays feature large datasets and short tasks that run many iterations. When running these applications on a parallel processing framework like Apache Spark, one problem that affects the running time is the straggler, where a disproportionate long-running task slows down the entire cluster. In this work we present a straggler mitigation technique tailored for applications that run small tasks for many iterations over a large dataset, and implemented the algorithm in Apache Spark. We monitor the resources available on each Spark node, and dynamically re partition the dataset proportional to the estimated resource available. We have shown that our algorithm has negligible overhead for resource monitoring, and can improve the performance of Spark cluster significantly when stragglers are present.
【 预 览 】
附件列表
Files
Size
Format
View
Mitigating Spark straggler tasks for iterative applications by data re-partitioning