21st International Conference on Computing in High Energy and Nuclear Physics | |
Job monitoring on DIRAC for Belle II distributed computing | |
物理学;计算机科学 | |
Kato, Yuji^1 ; Hayasaka, Kiyoshi^1 ; Hara, Takanori^2 ; Miyake, Hideki^2 ; Ueda, Ikuo^2,3 | |
Kobayashi-Maskawa Institute for the Origin of Particles and the Universe, Nagoya University, Chikusa-ku Furo-cho, Nagoya, Japan^1 | |
High Energy Accelerator Research Organization, 1-1, Oho, Tsukuba, Japan^2 | |
International Center for Elementary Particle Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo | |
113-0033, Japan^3 | |
关键词: Job monitoring; Log analysis; Monitoring system; Passive methods; Passive monitoring; Workload management; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/664/6/062023/pdf DOI : 10.1088/1742-6596/664/6/062023 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
![]() |
【 摘 要 】
We developed a monitoring system for Belle II distributed computing, which consists of active and passive methods. In this paper we describe the passive monitoring system, where information stored in the DIRAC database is processed and visualized. We divide the DIRAC workload management flow into steps and store characteristic variables which indicate issues. These variables are chosen carefully based on our experiences, then visualized. As a result, we are able to effectively detect issues. Finally, we discuss the future development for automating log analysis, notification of issues, and disabling problematic sites.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Job monitoring on DIRAC for Belle II distributed computing | 945KB | ![]() |