期刊论文详细信息
EURASIP Journal on Wireless Communications and Networking
A combined priority scheduling method for distributed machine learning
Research
GongYi Xiao1  Wen Li1  ChuanFu Zhang1  Jing Chen1  Hao Sun1  TianTian Du1  YuDong Geng1 
[1] Shandong Provincial Key Laboratory of Computer Networks, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China;
关键词: Cloud computing;    Distributed machine learning;    Resource scheduling;    Prioritization;   
DOI  :  10.1186/s13638-023-02253-4
 received in 2022-11-30, accepted in 2023-05-17,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

Algorithms and frameworks for distributed machine learning have been widely used in numerous artificial intelligence engineering applications. A cloud platform provides a large number of resources at a lower cost and is a more convenient method for such applications. With the rapid development of containerization, native cloud combinations based on Docker and Kubernetes have provided effective resource support for distributed machine learning. However, native Kubernetes does not provide efficient priority or fair resource scheduling strategies for distributed machine learning in computationally intensive and time-consuming jobs, which easily leads to resource deadlock, resource waste, and low job execution efficiency. Therefore, to utilize the execution order between multiple jobs in distributed machine learning as well as the dependencies between multiple tasks for the same job, considering intra- and inter-group scheduling priorities, a combined priority scheduling method is proposed for distributed machine learning based on Kubernetes and Volcano. Considering the user priority, task priority, longest wait time, task parallelism, and affinity and non-affinity between the parameter server and worker nodes, a combined priority scheduling model of inter- and intra-job priority is proposed, which is mapped into a scheduling strategy of inter- and intra-group priorities of pods, enabling the efficient scheduling and training of distributed machine learning. The experiment results show that the proposed method achieves preferential resource allocation for urgent, high parallelism, and high-priority jobs with high-priority users and improves the job execution efficiency. The affinity and anti-affinity settings among pods reduce the time of information interaction between the parameter server and worker nodes to a certain extent, thereby improving the job completion efficiency. This group scheduling strategy alleviates the problems of resource deadlock and waste caused by insufficient resources in cloud computing.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202308158148491ZK.pdf 2671KB PDF download
41116_2023_36_Article_IEq438.gif 1KB Image download
41116_2023_36_Article_IEq461.gif 1KB Image download
41116_2023_36_Article_IEq486.gif 1KB Image download
41116_2023_36_Article_IEq488.gif 1KB Image download
MediaObjects/12888_2023_4840_MOESM1_ESM.pdf 181KB PDF download
41116_2023_36_Article_IEq507.gif 1KB Image download
Fig. 4 190KB Image download
41116_2023_36_Article_IEq616.gif 1KB Image download
41116_2023_36_Article_IEq664.gif 1KB Image download
41116_2023_36_Article_IEq674.gif 1KB Image download
41116_2023_36_Article_IEq816.gif 1KB Image download
MediaObjects/12888_2023_4796_MOESM1_ESM.docx 14KB Other download
MediaObjects/12888_2023_4796_MOESM2_ESM.docx 15KB Other download
Fig. 8 794KB Image download
40517_2023_256_Article_IEq14.gif 1KB Image download
Fig. 1 136KB Image download
MediaObjects/40249_2023_1063_MOESM6_ESM.jpg 748KB Other download
Fig. 1 172KB Image download
40517_2023_258_Article_IEq113.gif 1KB Image download
40517_2023_258_Article_IEq114.gif 1KB Image download
40517_2023_258_Article_IEq115.gif 1KB Image download
Fig. 1 92KB Image download
MediaObjects/12888_2023_4818_MOESM2_ESM.docx 36KB Other download
40517_2023_258_Article_IEq119.gif 1KB Image download
MediaObjects/12888_2023_4818_MOESM3_ESM.pdf 985KB PDF download
40517_2023_258_Article_IEq121.gif 1KB Image download
40517_2023_258_Article_IEq122.gif 1KB Image download
Fig. 1 256KB Image download
Fig. 1 584KB Image download
Fig. 2 1027KB Image download
40517_2023_258_Article_IEq125.gif 1KB Image download
40517_2023_258_Article_IEq126.gif 1KB Image download
40517_2023_258_Article_IEq127.gif 1KB Image download
40517_2023_258_Article_IEq128.gif 1KB Image download
40517_2023_258_Article_IEq129.gif 1KB Image download
40517_2023_258_Article_IEq130.gif 1KB Image download
40517_2023_258_Article_IEq131.gif 1KB Image download
40517_2023_258_Article_IEq132.gif 1KB Image download
40517_2023_258_Article_IEq133.gif 1KB Image download
Fig. 1 592KB Image download
40517_2023_258_Article_IEq135.gif 1KB Image download
Fig. 8 517KB Image download
40517_2023_258_Article_IEq137.gif 1KB Image download
40517_2023_258_Article_IEq138.gif 1KB Image download
MediaObjects/40249_2023_1063_MOESM8_ESM.docx 62KB Other download
40517_2023_258_Article_IEq68.gif 1KB Image download
Fig. 2 396KB Image download
MediaObjects/12888_2023_4780_MOESM1_ESM.docx 13KB Other download
40517_2023_256_Article_IEq33.gif 1KB Image download
Fig. 1 229KB Image download
40517_2023_256_Article_IEq34.gif 1KB Image download
MediaObjects/41021_2023_273_MOESM3_ESM.docx 42KB Other download
40517_2023_256_Article_IEq35.gif 1KB Image download
12936_2023_4577_Article_IEq66.gif 1KB Image download
MediaObjects/12888_2023_4818_MOESM4_ESM.pdf 4381KB PDF download
MediaObjects/12888_2023_4780_MOESM2_ESM.docx 19KB Other download
603KB Image download
40517_2023_256_Article_IEq38.gif 1KB Image download
Fig. 2 295KB Image download
40517_2023_256_Article_IEq39.gif 1KB Image download
Fig. 1 2661KB Image download
Fig. 4 961KB Image download
MediaObjects/12888_2023_4885_MOESM1_ESM.docx 25KB Other download
MediaObjects/12302_2023_737_MOESM1_ESM.docx 12190KB Other download
MediaObjects/12888_2023_4885_MOESM2_ESM.doc 48KB Other download
Fig. 6 218KB Image download
Fig. 7 183KB Image download
MediaObjects/12974_2023_2804_MOESM3_ESM.tif 12261KB Other download
Fig. 3 462KB Image download
Fig. 2 450KB Image download
Fig. 3 286KB Image download
40517_2023_256_Article_IEq55.gif 1KB Image download
MediaObjects/13750_2023_304_MOESM6_ESM.xlsx 80KB Other download
Fig. 4 498KB Image download
MediaObjects/13750_2023_304_MOESM7_ESM.docx 26KB Other download
MediaObjects/12888_2023_4818_MOESM5_ESM.pdf 946KB PDF download
Fig. 3 974KB Image download
40517_2023_256_Article_IEq59.gif 1KB Image download
Fig. 1 2813KB Image download
40517_2023_256_Article_IEq60.gif 1KB Image download
13731_2023_296_Article_IEq2.gif 1KB Image download
Fig. 3 33KB Image download
Fig. 1 384KB Image download
40517_2023_256_Article_IEq62.gif 1KB Image download
MediaObjects/13690_2023_1102_MOESM1_ESM.docx 21KB Other download
40517_2023_256_Article_IEq64.gif 1KB Image download
41512_2023_147_Article_IEq71.gif 1KB Image download
MediaObjects/41408_2023_845_MOESM2_ESM.doc 25KB Other download
Fig. 5 180KB Image download
Fig. 1 95KB Image download
Fig. 1 984KB Image download
Fig. 2 208KB Image download
Fig. 6 37KB Image download
Fig. 1 862KB Image download
Fig. 3 213KB Image download
Fig. 7 166KB Image download
MediaObjects/12888_2023_4811_MOESM2_ESM.docx 112KB Other download
Fig. 3 477KB Image download
Fig. 8 125KB Image download
Fig. 9 87KB Image download
40517_2023_256_Article_IEq73.gif 1KB Image download
Fig. 9 118KB Image download
40517_2023_256_Article_IEq74.gif 1KB Image download
Fig. 1 462KB Image download
Fig. 10 117KB Image download
MediaObjects/12864_2023_9351_MOESM3_ESM.docx 95KB Other download
40517_2023_256_Article_IEq76.gif 1KB Image download
Fig. 11 747KB Image download
40517_2023_256_Article_IEq77.gif 1KB Image download
Fig. 1 1594KB Image download
40517_2023_256_Article_IEq78.gif 1KB Image download
Fig. 7 1046KB Image download
40517_2023_256_Article_IEq79.gif 1KB Image download
Fig. 3 249KB Image download
Fig. 2 1152KB Image download
Fig. 12 31KB Image download
40517_2023_256_Article_IEq81.gif 1KB Image download
Fig. 13 348KB Image download
40517_2023_256_Article_IEq83.gif 1KB Image download
Fig. 4 772KB Image download
40517_2023_256_Article_IEq85.gif 1KB Image download
40517_2023_256_Article_IEq86.gif 1KB Image download
40517_2023_256_Article_IEq87.gif 1KB Image download
MediaObjects/13690_2023_1097_MOESM1_ESM.pdf 131KB PDF download
MediaObjects/12888_2023_4871_MOESM1_ESM.docx 218KB Other download
MediaObjects/13690_2023_1097_MOESM2_ESM.docx 13KB Other download
MediaObjects/41408_2023_841_MOESM1_ESM.pdf 1957KB PDF download
41512_2023_147_Article_IEq89.gif 1KB Image download
Fig. 2 1123KB Image download
Fig. 1 664KB Image download
MediaObjects/12888_2023_4637_MOESM2_ESM.docx 18KB Other download
40854_2023_494_Article_IEq2.gif 1KB Image download
【 图 表 】

40854_2023_494_Article_IEq2.gif

Fig. 1

Fig. 2

41512_2023_147_Article_IEq89.gif

40517_2023_256_Article_IEq87.gif

40517_2023_256_Article_IEq86.gif

40517_2023_256_Article_IEq85.gif

Fig. 4

40517_2023_256_Article_IEq83.gif

Fig. 13

40517_2023_256_Article_IEq81.gif

Fig. 12

Fig. 2

Fig. 3

40517_2023_256_Article_IEq79.gif

Fig. 7

40517_2023_256_Article_IEq78.gif

Fig. 1

40517_2023_256_Article_IEq77.gif

Fig. 11

40517_2023_256_Article_IEq76.gif

Fig. 10

Fig. 1

40517_2023_256_Article_IEq74.gif

Fig. 9

40517_2023_256_Article_IEq73.gif

Fig. 9

Fig. 8

Fig. 3

Fig. 7

Fig. 3

Fig. 1

Fig. 6

Fig. 2

Fig. 1

Fig. 1

Fig. 5

41512_2023_147_Article_IEq71.gif

40517_2023_256_Article_IEq64.gif

40517_2023_256_Article_IEq62.gif

Fig. 1

Fig. 3

13731_2023_296_Article_IEq2.gif

40517_2023_256_Article_IEq60.gif

Fig. 1

40517_2023_256_Article_IEq59.gif

Fig. 3

Fig. 4

40517_2023_256_Article_IEq55.gif

Fig. 3

Fig. 2

Fig. 3

Fig. 7

Fig. 6

Fig. 4

Fig. 1

40517_2023_256_Article_IEq39.gif

Fig. 2

40517_2023_256_Article_IEq38.gif

12936_2023_4577_Article_IEq66.gif

40517_2023_256_Article_IEq35.gif

40517_2023_256_Article_IEq34.gif

Fig. 1

40517_2023_256_Article_IEq33.gif

Fig. 2

40517_2023_258_Article_IEq68.gif

40517_2023_258_Article_IEq138.gif

40517_2023_258_Article_IEq137.gif

Fig. 8

40517_2023_258_Article_IEq135.gif

Fig. 1

40517_2023_258_Article_IEq133.gif

40517_2023_258_Article_IEq132.gif

40517_2023_258_Article_IEq131.gif

40517_2023_258_Article_IEq130.gif

40517_2023_258_Article_IEq129.gif

40517_2023_258_Article_IEq128.gif

40517_2023_258_Article_IEq127.gif

40517_2023_258_Article_IEq126.gif

40517_2023_258_Article_IEq125.gif

Fig. 2

Fig. 1

Fig. 1

40517_2023_258_Article_IEq122.gif

40517_2023_258_Article_IEq121.gif

40517_2023_258_Article_IEq119.gif

Fig. 1

40517_2023_258_Article_IEq115.gif

40517_2023_258_Article_IEq114.gif

40517_2023_258_Article_IEq113.gif

Fig. 1

Fig. 1

40517_2023_256_Article_IEq14.gif

Fig. 8

41116_2023_36_Article_IEq816.gif

41116_2023_36_Article_IEq674.gif

41116_2023_36_Article_IEq664.gif

41116_2023_36_Article_IEq616.gif

Fig. 4

41116_2023_36_Article_IEq507.gif

41116_2023_36_Article_IEq488.gif

41116_2023_36_Article_IEq486.gif

41116_2023_36_Article_IEq461.gif

41116_2023_36_Article_IEq438.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  文献评价指标  
  下载次数:11次 浏览次数:3次