会议论文详细信息
20th International Conference on Computing in High Energy and Nuclear Physics
Testnodes: a Lightweight Node-Testing Infrastructure
物理学;计算机科学
Fay, R.^1 ; Bland, J.^1
Department of Physics, University of Liverpool, Liverpool
L69 7ZE, United Kingdom^1
关键词: Batch systems;    Client-side tests;    Design and implementations;    Development plans;    Server sides;    Testing infrastructure;    Worker nodes;   
Others  :  https://iopscience.iop.org/article/10.1088/1742-6596/513/6/062013/pdf
DOI  :  10.1088/1742-6596/513/6/062013
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

A key aspect of ensuring optimum cluster reliability and productivity lies in keeping worker nodes in a healthy state. Testnodes is a lightweight node testing solution developed at Liverpool. While Nagios has been used locally for general monitoring of hosts and services, Testnodes is optimised to answer one question: is there any reason this node should not be accepting jobs? This tight focus enables Testnodes to inspect nodes frequently with minimal impact and provide a comprehensive and easily extended check with each inspection. On the server side, Testnodes, implemented in python, interoperates with the Torque batch server to control the nodes production status. Testnodes remotely and in parallel executes client-side test scripts and processes the return codes and output, adjusting the node's online/offline status accordingly to preserve the integrity of the overall batch system. Testnodes reports via log, email and Nagios, allowing a quick overview of node status to be reviewed and specific node issues to be identified and resolved quickly. This presentation will cover testnodes design and implementation, together with the results of its use in production at Liverpool, and future development plans.

【 预 览 】
附件列表
Files Size Format View
Testnodes: a Lightweight Node-Testing Infrastructure 868KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:15次