科技报告详细信息
Evaluating operating system vulnerability to memory errors.
Ferreira, Kurt Brian ; Bridges, Patrick G. (University of New Mexico) ; Pedretti, Kevin Thomas Tauke ; Mueller, Frank (North Carolina State University) ; Fiala, David (North Carolina State University) ; Brightwell, Ronald Brian
Sandia National Laboratories
关键词: 99 General And Miscellaneous//Mathematics, Computing, And Information Science;    Vulnerability;    Kernels;    Targets;    Reliability;   
DOI  :  10.2172/1044952
RP-ID  :  SAND2012-4060
RP-ID  :  AC04-94AL85000
RP-ID  :  1044952
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】
Reliability is of great concern to the scalability of extreme-scale systems. Of particular concern are soft errors in main memory, which are a leading cause of failures on current systems and are predicted to be the leading cause on future systems. While great effort has gone into designing algorithms and applications that can continue to make progress in the presence of these errors without restarting, the most critical software running on a node, the operating system (OS), is currently left relatively unprotected. OS resiliency is of particular importance because, though this software typically represents a small footprint of a compute node's physical memory, recent studies show more memory errors in this region of memory than the remainder of the system. In this paper, we investigate the soft error vulnerability of two operating systems used in current and future high-performance computing systems: Kitten, the lightweight kernel developed at Sandia National Laboratories, and CLE, a high-performance Linux-based operating system developed by Cray. For each of these platforms, we outline major structures and subsystems that are vulnerable to soft errors and describe methods that could be used to reconstruct damaged state. Our results show the Kitten lightweight operating system may be an easier target to harden against memory errors due to its smaller memory footprint, largely deterministic state, and simpler system structure.
【 预 览 】
附件列表
Files Size Format View
1044952.pdf 449KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:12次