EURASIP Journal on Wireless Communications and Networking | |
A combined priority scheduling method for distributed machine learning | |
Research | |
GongYi Xiao1  Wen Li1  ChuanFu Zhang1  Jing Chen1  Hao Sun1  TianTian Du1  YuDong Geng1  | |
[1] Shandong Provincial Key Laboratory of Computer Networks, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; | |
关键词: Cloud computing; Distributed machine learning; Resource scheduling; Prioritization; | |
DOI : 10.1186/s13638-023-02253-4 | |
received in 2022-11-30, accepted in 2023-05-17, 发布年份 2023 | |
来源: Springer | |
![]() |
【 摘 要 】
Algorithms and frameworks for distributed machine learning have been widely used in numerous artificial intelligence engineering applications. A cloud platform provides a large number of resources at a lower cost and is a more convenient method for such applications. With the rapid development of containerization, native cloud combinations based on Docker and Kubernetes have provided effective resource support for distributed machine learning. However, native Kubernetes does not provide efficient priority or fair resource scheduling strategies for distributed machine learning in computationally intensive and time-consuming jobs, which easily leads to resource deadlock, resource waste, and low job execution efficiency. Therefore, to utilize the execution order between multiple jobs in distributed machine learning as well as the dependencies between multiple tasks for the same job, considering intra- and inter-group scheduling priorities, a combined priority scheduling method is proposed for distributed machine learning based on Kubernetes and Volcano. Considering the user priority, task priority, longest wait time, task parallelism, and affinity and non-affinity between the parameter server and worker nodes, a combined priority scheduling model of inter- and intra-job priority is proposed, which is mapped into a scheduling strategy of inter- and intra-group priorities of pods, enabling the efficient scheduling and training of distributed machine learning. The experiment results show that the proposed method achieves preferential resource allocation for urgent, high parallelism, and high-priority jobs with high-priority users and improves the job execution efficiency. The affinity and anti-affinity settings among pods reduce the time of information interaction between the parameter server and worker nodes to a certain extent, thereby improving the job completion efficiency. This group scheduling strategy alleviates the problems of resource deadlock and waste caused by insufficient resources in cloud computing.
【 授权许可】
CC BY
© The Author(s) 2023
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202308158148491ZK.pdf | 2671KB | ![]() |
|
41116_2023_36_Article_IEq438.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq461.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq486.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq488.gif | 1KB | Image | ![]() |
MediaObjects/12888_2023_4840_MOESM1_ESM.pdf | 181KB | ![]() |
|
41116_2023_36_Article_IEq507.gif | 1KB | Image | ![]() |
Fig. 4 | 190KB | Image | ![]() |
41116_2023_36_Article_IEq616.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq664.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq674.gif | 1KB | Image | ![]() |
41116_2023_36_Article_IEq816.gif | 1KB | Image | ![]() |
MediaObjects/12888_2023_4796_MOESM1_ESM.docx | 14KB | Other | ![]() |
MediaObjects/12888_2023_4796_MOESM2_ESM.docx | 15KB | Other | ![]() |
Fig. 8 | 794KB | Image | ![]() |
40517_2023_256_Article_IEq14.gif | 1KB | Image | ![]() |
Fig. 1 | 136KB | Image | ![]() |
MediaObjects/40249_2023_1063_MOESM6_ESM.jpg | 748KB | Other | ![]() |
Fig. 1 | 172KB | Image | ![]() |
40517_2023_258_Article_IEq113.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq114.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq115.gif | 1KB | Image | ![]() |
Fig. 1 | 92KB | Image | ![]() |
MediaObjects/12888_2023_4818_MOESM2_ESM.docx | 36KB | Other | ![]() |
40517_2023_258_Article_IEq119.gif | 1KB | Image | ![]() |
MediaObjects/12888_2023_4818_MOESM3_ESM.pdf | 985KB | ![]() |
|
40517_2023_258_Article_IEq121.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq122.gif | 1KB | Image | ![]() |
Fig. 1 | 256KB | Image | ![]() |
Fig. 1 | 584KB | Image | ![]() |
Fig. 2 | 1027KB | Image | ![]() |
40517_2023_258_Article_IEq125.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq126.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq127.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq128.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq129.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq130.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq131.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq132.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq133.gif | 1KB | Image | ![]() |
Fig. 1 | 592KB | Image | ![]() |
40517_2023_258_Article_IEq135.gif | 1KB | Image | ![]() |
Fig. 8 | 517KB | Image | ![]() |
40517_2023_258_Article_IEq137.gif | 1KB | Image | ![]() |
40517_2023_258_Article_IEq138.gif | 1KB | Image | ![]() |
MediaObjects/40249_2023_1063_MOESM8_ESM.docx | 62KB | Other | ![]() |
40517_2023_258_Article_IEq68.gif | 1KB | Image | ![]() |
Fig. 2 | 396KB | Image | ![]() |
MediaObjects/12888_2023_4780_MOESM1_ESM.docx | 13KB | Other | ![]() |
40517_2023_256_Article_IEq33.gif | 1KB | Image | ![]() |
Fig. 1 | 229KB | Image | ![]() |
40517_2023_256_Article_IEq34.gif | 1KB | Image | ![]() |
MediaObjects/41021_2023_273_MOESM3_ESM.docx | 42KB | Other | ![]() |
40517_2023_256_Article_IEq35.gif | 1KB | Image | ![]() |
12936_2023_4577_Article_IEq66.gif | 1KB | Image | ![]() |
MediaObjects/12888_2023_4818_MOESM4_ESM.pdf | 4381KB | ![]() |
|
MediaObjects/12888_2023_4780_MOESM2_ESM.docx | 19KB | Other | ![]() |
603KB | Image | ![]() |
|
40517_2023_256_Article_IEq38.gif | 1KB | Image | ![]() |
Fig. 2 | 295KB | Image | ![]() |
40517_2023_256_Article_IEq39.gif | 1KB | Image | ![]() |
Fig. 1 | 2661KB | Image | ![]() |
Fig. 4 | 961KB | Image | ![]() |
MediaObjects/12888_2023_4885_MOESM1_ESM.docx | 25KB | Other | ![]() |
MediaObjects/12302_2023_737_MOESM1_ESM.docx | 12190KB | Other | ![]() |
MediaObjects/12888_2023_4885_MOESM2_ESM.doc | 48KB | Other | ![]() |
Fig. 6 | 218KB | Image | ![]() |
Fig. 7 | 183KB | Image | ![]() |
MediaObjects/12974_2023_2804_MOESM3_ESM.tif | 12261KB | Other | ![]() |
Fig. 3 | 462KB | Image | ![]() |
Fig. 2 | 450KB | Image | ![]() |
Fig. 3 | 286KB | Image | ![]() |
40517_2023_256_Article_IEq55.gif | 1KB | Image | ![]() |
MediaObjects/13750_2023_304_MOESM6_ESM.xlsx | 80KB | Other | ![]() |
Fig. 4 | 498KB | Image | ![]() |
MediaObjects/13750_2023_304_MOESM7_ESM.docx | 26KB | Other | ![]() |
MediaObjects/12888_2023_4818_MOESM5_ESM.pdf | 946KB | ![]() |
|
Fig. 3 | 974KB | Image | ![]() |
40517_2023_256_Article_IEq59.gif | 1KB | Image | ![]() |
Fig. 1 | 2813KB | Image | ![]() |
40517_2023_256_Article_IEq60.gif | 1KB | Image | ![]() |
13731_2023_296_Article_IEq2.gif | 1KB | Image | ![]() |
Fig. 3 | 33KB | Image | ![]() |
Fig. 1 | 384KB | Image | ![]() |
40517_2023_256_Article_IEq62.gif | 1KB | Image | ![]() |
MediaObjects/13690_2023_1102_MOESM1_ESM.docx | 21KB | Other | ![]() |
40517_2023_256_Article_IEq64.gif | 1KB | Image | ![]() |
41512_2023_147_Article_IEq71.gif | 1KB | Image | ![]() |
MediaObjects/41408_2023_845_MOESM2_ESM.doc | 25KB | Other | ![]() |
Fig. 5 | 180KB | Image | ![]() |
Fig. 1 | 95KB | Image | ![]() |
Fig. 1 | 984KB | Image | ![]() |
Fig. 2 | 208KB | Image | ![]() |
Fig. 6 | 37KB | Image | ![]() |
Fig. 1 | 862KB | Image | ![]() |
Fig. 3 | 213KB | Image | ![]() |
Fig. 7 | 166KB | Image | ![]() |
MediaObjects/12888_2023_4811_MOESM2_ESM.docx | 112KB | Other | ![]() |
Fig. 3 | 477KB | Image | ![]() |
Fig. 8 | 125KB | Image | ![]() |
Fig. 9 | 87KB | Image | ![]() |
40517_2023_256_Article_IEq73.gif | 1KB | Image | ![]() |
Fig. 9 | 118KB | Image | ![]() |
40517_2023_256_Article_IEq74.gif | 1KB | Image | ![]() |
Fig. 1 | 462KB | Image | ![]() |
Fig. 10 | 117KB | Image | ![]() |
MediaObjects/12864_2023_9351_MOESM3_ESM.docx | 95KB | Other | ![]() |
40517_2023_256_Article_IEq76.gif | 1KB | Image | ![]() |
Fig. 11 | 747KB | Image | ![]() |
40517_2023_256_Article_IEq77.gif | 1KB | Image | ![]() |
Fig. 1 | 1594KB | Image | ![]() |
40517_2023_256_Article_IEq78.gif | 1KB | Image | ![]() |
Fig. 7 | 1046KB | Image | ![]() |
40517_2023_256_Article_IEq79.gif | 1KB | Image | ![]() |
Fig. 3 | 249KB | Image | ![]() |
Fig. 2 | 1152KB | Image | ![]() |
Fig. 12 | 31KB | Image | ![]() |
40517_2023_256_Article_IEq81.gif | 1KB | Image | ![]() |
Fig. 13 | 348KB | Image | ![]() |
40517_2023_256_Article_IEq83.gif | 1KB | Image | ![]() |
Fig. 4 | 772KB | Image | ![]() |
40517_2023_256_Article_IEq85.gif | 1KB | Image | ![]() |
40517_2023_256_Article_IEq86.gif | 1KB | Image | ![]() |
40517_2023_256_Article_IEq87.gif | 1KB | Image | ![]() |
MediaObjects/13690_2023_1097_MOESM1_ESM.pdf | 131KB | ![]() |
|
MediaObjects/12888_2023_4871_MOESM1_ESM.docx | 218KB | Other | ![]() |
MediaObjects/13690_2023_1097_MOESM2_ESM.docx | 13KB | Other | ![]() |
MediaObjects/41408_2023_841_MOESM1_ESM.pdf | 1957KB | ![]() |
|
41512_2023_147_Article_IEq89.gif | 1KB | Image | ![]() |
Fig. 2 | 1123KB | Image | ![]() |
Fig. 1 | 664KB | Image | ![]() |
MediaObjects/12888_2023_4637_MOESM2_ESM.docx | 18KB | Other | ![]() |
40854_2023_494_Article_IEq2.gif | 1KB | Image | ![]() |
【 图 表 】
40854_2023_494_Article_IEq2.gif
Fig. 1
Fig. 2
41512_2023_147_Article_IEq89.gif
40517_2023_256_Article_IEq87.gif
40517_2023_256_Article_IEq86.gif
40517_2023_256_Article_IEq85.gif
Fig. 4
40517_2023_256_Article_IEq83.gif
Fig. 13
40517_2023_256_Article_IEq81.gif
Fig. 12
Fig. 2
Fig. 3
40517_2023_256_Article_IEq79.gif
Fig. 7
40517_2023_256_Article_IEq78.gif
Fig. 1
40517_2023_256_Article_IEq77.gif
Fig. 11
40517_2023_256_Article_IEq76.gif
Fig. 10
Fig. 1
40517_2023_256_Article_IEq74.gif
Fig. 9
40517_2023_256_Article_IEq73.gif
Fig. 9
Fig. 8
Fig. 3
Fig. 7
Fig. 3
Fig. 1
Fig. 6
Fig. 2
Fig. 1
Fig. 1
Fig. 5
41512_2023_147_Article_IEq71.gif
40517_2023_256_Article_IEq64.gif
40517_2023_256_Article_IEq62.gif
Fig. 1
Fig. 3
13731_2023_296_Article_IEq2.gif
40517_2023_256_Article_IEq60.gif
Fig. 1
40517_2023_256_Article_IEq59.gif
Fig. 3
Fig. 4
40517_2023_256_Article_IEq55.gif
Fig. 3
Fig. 2
Fig. 3
Fig. 7
Fig. 6
Fig. 4
Fig. 1
40517_2023_256_Article_IEq39.gif
Fig. 2
40517_2023_256_Article_IEq38.gif
12936_2023_4577_Article_IEq66.gif
40517_2023_256_Article_IEq35.gif
40517_2023_256_Article_IEq34.gif
Fig. 1
40517_2023_256_Article_IEq33.gif
Fig. 2
40517_2023_258_Article_IEq68.gif
40517_2023_258_Article_IEq138.gif
40517_2023_258_Article_IEq137.gif
Fig. 8
40517_2023_258_Article_IEq135.gif
Fig. 1
40517_2023_258_Article_IEq133.gif
40517_2023_258_Article_IEq132.gif
40517_2023_258_Article_IEq131.gif
40517_2023_258_Article_IEq130.gif
40517_2023_258_Article_IEq129.gif
40517_2023_258_Article_IEq128.gif
40517_2023_258_Article_IEq127.gif
40517_2023_258_Article_IEq126.gif
40517_2023_258_Article_IEq125.gif
Fig. 2
Fig. 1
Fig. 1
40517_2023_258_Article_IEq122.gif
40517_2023_258_Article_IEq121.gif
40517_2023_258_Article_IEq119.gif
Fig. 1
40517_2023_258_Article_IEq115.gif
40517_2023_258_Article_IEq114.gif
40517_2023_258_Article_IEq113.gif
Fig. 1
Fig. 1
40517_2023_256_Article_IEq14.gif
Fig. 8
41116_2023_36_Article_IEq816.gif
41116_2023_36_Article_IEq674.gif
41116_2023_36_Article_IEq664.gif
41116_2023_36_Article_IEq616.gif
Fig. 4
41116_2023_36_Article_IEq507.gif
41116_2023_36_Article_IEq488.gif
41116_2023_36_Article_IEq486.gif
41116_2023_36_Article_IEq461.gif
41116_2023_36_Article_IEq438.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]