| 20th International Conference on Computing in High Energy and Nuclear Physics | |
| Testnodes: a Lightweight Node-Testing Infrastructure | |
| 物理学;计算机科学 | |
| Fay, R.^1 ; Bland, J.^1 | |
| Department of Physics, University of Liverpool, Liverpool | |
| L69 7ZE, United Kingdom^1 | |
| 关键词: Batch systems; Client-side tests; Design and implementations; Development plans; Server sides; Testing infrastructure; Worker nodes; | |
| Others : https://iopscience.iop.org/article/10.1088/1742-6596/513/6/062013/pdf DOI : 10.1088/1742-6596/513/6/062013 |
|
| 学科分类:计算机科学(综合) | |
| 来源: IOP | |
PDF
|
|
【 摘 要 】
A key aspect of ensuring optimum cluster reliability and productivity lies in keeping worker nodes in a healthy state. Testnodes is a lightweight node testing solution developed at Liverpool. While Nagios has been used locally for general monitoring of hosts and services, Testnodes is optimised to answer one question: is there any reason this node should not be accepting jobs? This tight focus enables Testnodes to inspect nodes frequently with minimal impact and provide a comprehensive and easily extended check with each inspection. On the server side, Testnodes, implemented in python, interoperates with the Torque batch server to control the nodes production status. Testnodes remotely and in parallel executes client-side test scripts and processes the return codes and output, adjusting the node's online/offline status accordingly to preserve the integrity of the overall batch system. Testnodes reports via log, email and Nagios, allowing a quick overview of node status to be reviewed and specific node issues to be identified and resolved quickly. This presentation will cover testnodes design and implementation, together with the results of its use in production at Liverpool, and future development plans.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| Testnodes: a Lightweight Node-Testing Infrastructure | 868KB |
PDF