21st International Conference on Computing in High Energy and Nuclear Physics | |
Jobs masonry in LHCb with elastic Grid Jobs | |
物理学;计算机科学 | |
Stagni, F.^1 ; Charpentier, Ph^1 | |
PH Department, Geneva | |
CH-1211-23, Switzerland^1 | |
关键词: Batch systems; Computing capability; Computing infrastructures; Computing resource; Effective solution; Just in time; Low priorities; Multi-jobs; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/664/6/062060/pdf DOI : 10.1088/1742-6596/664/6/062060 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
In any distributed computing infrastructure, a job is normally forbidden to run for an indefinite amount of time. This limitation is implemented using different technologies, the most common one being the CPU time limit implemented by batch queues. It is therefore important to have a good estimate of how much CPU work a job will require: otherwise, it might be killed by the batch system, or by whatever system is controlling the jobs' execution. In many modern interwares, the jobs are actually executed by pilot jobs, that can use the whole available time in running multiple consecutive jobs. If at some point the available time in a pilot is too short for the execution of any job, it should be released, while it could have been used efficiently by a shorter job. Within LHCbDIRAC, the LHCb extension of the DIRAC interware, we developed a simple way to fully exploit computing capabilities available to a pilot, even for resources with limited time capabilities, by adding elasticity to production MonteCarlo (MC) simulation jobs. With our approach, independently of the time available, LHCbDIRAC will always have the possibility to execute a MC job, whose length will be adapted to the available amount of time: therefore the same job, running on different computing resources with different time limits, will produce different amounts of events. The decision on the number of events to be produced is made just in time at the start of the job, when the capabilities of the resource are known. In order to know how many events a MC job will be instructed to produce, LHCbDIRAC simply requires three values: the CPU-work per event for that type of job, the power of the machine it is running on, and the time left for the job before being killed. Knowing these values, we can estimate the number of events the job will be able to simulate with the available CPU time. This paper will demonstrate that, using this simple but effective solution, LHCb manages to make a more efficient use of the available resources, and that it can easily use new types of resources. An example is represented by resources provided by batch queues, where low-priority MC jobs can be used as "masonry" jobs in multi-jobs pilots. A second example is represented by opportunistic resources with limited available time.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Jobs masonry in LHCb with elastic Grid Jobs | 1457KB | download |