| 17th International Workshop on Advanced Computing and Analysis Techniques in Physics Research | |
| A scalable architecture for online anomaly detection of WLCG batch jobs | |
| 物理学;计算机科学 | |
| Kuehn, E.^1 ; Fischer, M.^1 ; Giffels, M.^1 ; Jung, C.^1 ; Petzold, A.^1 | |
| Karlsruhe Institute of Technology, Steinbuch Centre for Computing, Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen | |
| 76344, Germany^1 | |
| 关键词: Anomaly detection; Computational costs; Local information; Misconfigurations; Network communications; Online anomaly detection; Scalable architectures; Superpeer networks; | |
| Others : https://iopscience.iop.org/article/10.1088/1742-6596/762/1/012002/pdf DOI : 10.1088/1742-6596/762/1/012002 |
|
| 学科分类:计算机科学(综合) | |
| 来源: IOP | |
PDF
|
|
【 摘 要 】
For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring traffic data and characteristics of WLCG batch jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| A scalable architecture for online anomaly detection of WLCG batch jobs | 750KB |
PDF