期刊论文详细信息
IEEE Access
Scheduling Spark Tasks With Data Skew and Deadline Constraints
Haihua Gu1  Zhipeng Lu2  Xiaoping Li2 
[1] School of Artificial Intelligence, Nanjing Vocational College of Information Technology, Nanjing, China;School of Computer Science and Engineering, Southeast University, Nanjing, China;
关键词: Data skew;    spark;    scheduling optimization;    cloud computing;   
DOI  :  10.1109/ACCESS.2020.3040719
来源: DOAJ
【 摘 要 】

Data skew has an essential impact on the performance of big data processing. Spark task scheduling with data skew and deadline constraints is considered to minimize the total rental cost in this paper. A modified scheduling architecture is developed in terms of the unique characteristics of the considered problem. A mathematical model is constructed, and a Spark task scheduling algorithm is proposed considering both the data skew and deadline constraints. The algorithm consists of three components: stage sequencing, task scheduling, and scheduling adjustment. Strategies for each of the components are presented. The parameters and components of the proposed algorithm are calibrated over many random instances. The calibrated algorithm is compared to two existing algorithms for similar problems over classical scientific workflow applications. Experimental results show that the proposed algorithm outperforms the compared algorithms statistically.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次