会议论文详细信息
21st International Conference on Computing in High Energy and Nuclear Physics
Job monitoring on DIRAC for Belle II distributed computing
物理学;计算机科学
Kato, Yuji^1 ; Hayasaka, Kiyoshi^1 ; Hara, Takanori^2 ; Miyake, Hideki^2 ; Ueda, Ikuo^2,3
Kobayashi-Maskawa Institute for the Origin of Particles and the Universe, Nagoya University, Chikusa-ku Furo-cho, Nagoya, Japan^1
High Energy Accelerator Research Organization, 1-1, Oho, Tsukuba, Japan^2
International Center for Elementary Particle Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo
113-0033, Japan^3
关键词: Job monitoring;    Log analysis;    Monitoring system;    Passive methods;    Passive monitoring;    Workload management;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/664/6/062023/pdf
DOI  :  10.1088/1742-6596/664/6/062023
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

We developed a monitoring system for Belle II distributed computing, which consists of active and passive methods. In this paper we describe the passive monitoring system, where information stored in the DIRAC database is processed and visualized. We divide the DIRAC workload management flow into steps and store characteristic variables which indicate issues. These variables are chosen carefully based on our experiences, then visualized. As a result, we are able to effectively detect issues. Finally, we discuss the future development for automating log analysis, notification of issues, and disabling problematic sites.

【 预 览 】
附件列表
Files Size Format View
Job monitoring on DIRAC for Belle II distributed computing 945KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:21次