科技报告详细信息
Meaningful statistical analysis of large computational clusters.
Gentile, Ann C. ; Marzouk, Youssef M. ; Brandt, James M. ; Pebay, Philippe Pierre
Sandia National Laboratories
关键词: Errors;    99 General And Miscellaneous//Mathematics, Computing, And Information Science;    Monitoring;    Statistics;    Computer Calculations;   
DOI  :  10.2172/958384
RP-ID  :  SAND2005-4558
RP-ID  :  AC04-94AL85000
RP-ID  :  958384
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

Effective monitoring of large computational clusters demands the analysis of a vast amount of raw data from a large number of machines. The fundamental interactions of the system are not, however, well-defined, making it difficult to draw meaningful conclusions from this data, even if one were able to efficiently handle and process it. In this paper we show that computational clusters, because they are comprised of a large number of identical machines, behave in a statistically meaningful fashion. We therefore can employ normal statistical methods to derive information about individual systems and their environment and to detect problems sooner than with traditional mechanisms. We discuss design details necessary to use these methods on a large system in a timely and low-impact fashion.

【 预 览 】
附件列表
Files Size Format View
958384.pdf 330KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:19次