学位论文详细信息
Fault tolerance core: a framework for application-aware reliability
Fault Tolerance;Reliability
Sidea, Valentin ; Kalbarczyk, Zbigniew T
关键词: Fault Tolerance;    Reliability;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/78280/SIDEA-THESIS-2015.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

As processor manufacturers keep pushing the limits of the transistor, the reliability of computer systems has become an increasing concern. Various fault tolerance techniques have been developed in an effort to provide reliable computing in the presence of faults. These approaches suffer from either a high resource cost or high performance overhead. This thesis presents a design for a Fault Tolerance Core (FTC) that uses configurable application-aware hardware modules for improving reliability. Application-aware fault tolerance is achieved by detecting perturbations in application execution through the monitoring of processor pipeline signals. This approach leverages hardware resources more efficiently than replication. The FTC achieves low overhead by placing fault tolerance hardware separately from the processing core, minimizing the processor data collection hardware, and by performing fault detection in the background.This thesis presents work that has been completed towards the achievement of a FTC. This work includes a hardware assisted incremental checkpoint, an application hang detector and a preliminary FTC framework for integrating these into a Leon3 microprocessor. All modules have been implemented and tested on a Leon3 synthesized atop a Stratix III FPGA running a Linux environment. A hardware fault injector capable of modifying 9 distinct processor pipeline signals has been implemented for performing validation experiments on the modules.

【 预 览 】
附件列表
Files Size Format View
Fault tolerance core: a framework for application-aware reliability 975KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:18次