会议论文详细信息
21st International Conference on Computing in High Energy and Nuclear Physics
Active Job Monitoring in Pilots
物理学;计算机科学
Kuehn, Eileen^1 ; Fischer, Max^1 ; Giffels, Manuel^1 ; Jung, Christopher^1 ; Petzold, Andreas^1
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany^1
关键词: Active monitoring;    Computing clusters;    Critical component;    Current monitoring;    Data federation;    Monitoring tools;    Network-Aware Scheduling;    Scheduling process;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/664/5/052019/pdf
DOI  :  10.1088/1742-6596/664/5/052019
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

Recent developments in high energy physics (HEP) including multi-core jobs and multi-core pilots require data centres to gain a deep understanding of the system to monitor, design, and upgrade computing clusters. Networking is a critical component. Especially the increased usage of data federations, for example in diskless computing centres or as a fallback solution, relies on WAN connectivity and availability. The specific demands of different experiments and communities, but also the need for identification of misbehaving batch jobs, requires an active monitoring. Existing monitoring tools are not capable of measuring fine-grained information at batch job level. This complicates network-aware scheduling and optimisations. In addition, pilots add another layer of abstraction. They behave like batch systems themselves by managing and executing payloads of jobs internally. The number of real jobs being executed is unknown, as the original batch system has no access to internal information about the scheduling process inside the pilots. Therefore, the comparability of jobs and pilots for predicting run-time behaviour or network performance cannot be ensured. Hence, identifying the actual payload is important. At the GridKa Tier 1 centre a specific tool is in use that allows the monitoring of network traffic information at batch job level. This contribution presents the current monitoring approach and discusses recent efforts and importance to identify pilots and their substructures inside the batch system. It will also show how to determine monitoring data of specific jobs from identified pilots. Finally, the approach is evaluated.

【 预 览 】
附件列表
Files Size Format View
Active Job Monitoring in Pilots 1019KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:15次