Meaningful statistical analysis of large computational clusters. | |
Gentile, Ann C. ; Marzouk, Youssef M. ; Brandt, James M. ; Pebay, Philippe Pierre | |
Sandia National Laboratories | |
关键词: Errors; 99 General And Miscellaneous//Mathematics, Computing, And Information Science; Monitoring; Statistics; Computer Calculations; | |
DOI : 10.2172/958384 RP-ID : SAND2005-4558 RP-ID : AC04-94AL85000 RP-ID : 958384 |
|
美国|英语 | |
来源: UNT Digital Library | |
【 摘 要 】
Effective monitoring of large computational clusters demands the analysis of a vast amount of raw data from a large number of machines. The fundamental interactions of the system are not, however, well-defined, making it difficult to draw meaningful conclusions from this data, even if one were able to efficiently handle and process it. In this paper we show that computational clusters, because they are comprised of a large number of identical machines, behave in a statistically meaningful fashion. We therefore can employ normal statistical methods to derive information about individual systems and their environment and to detect problems sooner than with traditional mechanisms. We discuss design details necessary to use these methods on a large system in a timely and low-impact fashion.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
958384.pdf | 330KB | download |