学位论文

【摘要】

Many of the data science applications nowadays feature large datasets and short tasks that run many iterations. When running these applications on a parallel processing framework like Apache Spark, one problem that affects the running time is the straggler, where a disproportionate long-running task slows down the entire cluster. In this work we present a straggler mitigation technique tailored for applications that run small tasks for many iterations over a large dataset, and implemented the algorithm in Apache Spark. We monitor the resources available on each Spark node, and dynamically re partition the dataset proportional to the estimated resource available. We have shown that our algorithm has negligible overhead for resource monitoring, and can improve the performance of Spark cluster significantly when stragglers are present.

【预览】

附件列表
Files	Size	Format	View
Mitigating Spark straggler tasks for iterative applications by data re-partitioning	474KB	PDF	download


Mitigating Spark straggler tasks for iterative applications by data re-partitioning
Straggler;Machine learning;Apache Spark;Iterative application
Teng, Bo ; Campbell ; Roy H.
关键词: Straggler; Machine learning; Apache Spark; Iterative application;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/97707/TENG-THESIS-2017.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：0次	浏览次数：10次

【 摘 要 】

【 预 览 】

【摘要】

【预览】