| 20th International Conference on Computing in High Energy and Nuclear Physics | |
| Minimizing draining waste through extending the lifetime of pilot jobs in Grid environments | |
| 物理学;计算机科学 | |
| Sfiligoi, I.^1 ; Martin, T.^1 ; Bockelman, B.P.^2 ; Bradley, D.C.^3 ; Würthwein, F.^1 | |
| University of California San Diego, 9500 Gilman Dr, San Diego | |
| CA | |
| 92093, United States^1 | |
| University of Nebraska-Lincoln, 118 Schorr Ctr, Lincoln | |
| NE | |
| 68588, United States^2 | |
| University of Wisconsin-Madison, 1150 University Ave, Madison | |
| WI | |
| 53706, United States^3 | |
| 关键词: Adverse effect; Compute resources; Gradual transition; Grid ecosystems; Grid environments; Grid resource provider; Many-core computing; Prototype system; | |
| Others : https://iopscience.iop.org/article/10.1088/1742-6596/513/3/032089/pdf DOI : 10.1088/1742-6596/513/3/032089 |
|
| 学科分类:计算机科学(综合) | |
| 来源: IOP | |
PDF
|
|
【 摘 要 】
The computing landscape is moving at an accelerated pace to many-core computing. Nowadays, it is not unusual to get 32 cores on a single physical node. As a consequence, there is increased pressure in the pilot systems domain to move from purely single-core scheduling and allow multi-core jobs as well. In order to allow for a gradual transition from single-core to multi-core user jobs, it is envisioned that pilot jobs will have to handle both kinds of user jobs at the same time, by requesting several cores at a time from Grid providers and then partitioning them between the user jobs at runtime. Unfortunately, the current Grid ecosystem only allows for relatively short lifetime of pilot jobs, requiring frequent draining, with the relative waste of compute resources due to varying lifetimes of the user jobs. Significantly extending the lifetime of pilot jobs is thus highly desirable, but must come without any adverse effects for the Grid resource providers. In this paper we present a mechanism, based on communication between the pilot jobs and the Grid provider, that allows for pilot jobs to run for extended periods of time when there are available resources, but also allows the Grid provider to reclaim the resources in a short amount of time when needed. We also present the experience of running a prototype system using the above mechanism on a few US-based Grid sites.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| Minimizing draining waste through extending the lifetime of pilot jobs in Grid environments | 618KB |
PDF