学位论文详细信息
Delivering Affordable Fault-tolerance to Commodity Computer Systems.
Fault Tolerant Computing;Computer Architecture;Compiler Analysis;Computer Science;Electrical Engineering;Engineering;Computer Science & Engineering
Feng, ShuguangWenisch, Thomas F. ;
University of Michigan
关键词: Fault Tolerant Computing;    Computer Architecture;    Compiler Analysis;    Computer Science;    Electrical Engineering;    Engineering;    Computer Science & Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/86483/shoe_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
To meet an insatiable consumer demand for greater performance at less power, silicon technologyhas scaled to unprecedented dimensions. This aggressive scaling has provided designerswith an ever increasing budget of cheaper and faster transistors. Unfortunately, thistrend has also been accompanied by a decline in individual device reliability as transistorshave become increasingly susceptible to a host of threats.With each new technology generation the challenges associated with process variation,wearout, and transient faults gain greater prominence. We are quickly approaching a newera where fault-tolerance is becoming a first-order design constraint, no longer a luxuryreserved exclusively for high-reliability, mission-critical domains. Even commodity microprocessors used in mainstream computing will require protection.However, just as the reliability needs of NASA and Apple differ dramatically, so doestheir ability to absorb the costs necessary to ensure fault-tolerance. Viable solutions targeting commodity systems must not only recognize this fact, but must embrace it. Simplystripping down techniques developed for enterprise servers may not result in the most appropriate designs for your laptop or cellphone. The best solutions will exploit the relaxedreliability constraints of commodity systems, judiciously sacrificing a small degree of fault tolerance to achieve far greater reductions in overhead costs.This thesis proposes a collection of works that can be selectively mixed and matched toassemble reliability solutions tailor-fit for the commodity systems community. Althoughthe works presented address a variety of different issues from wearout to transient faultsand prevention to detection, they were all motivated by the same observation–that muchof the overhead costs associated with conventional fault tolerance mechanisms are spent in pursuit of the last few ;;nines” of reliability. This conclusion gave rise to the philosophy permeating the chapters of this work, that summarily dismissing techniques that cannotsupply mission-critical fault tolerance is no longer acceptable. In presenting concrete solutions to a few of the more interesting challenges—proactive wear-leveling orchestratedthrough intelligent job scheduling and software-only transient fault detection and recovery that exploits intrinsic computational patterns within applications—we establish fundamental principles that can be applied more broadly to formulate a comprehensive reliabilitystrategy.
【 预 览 】
附件列表
Files Size Format View
Delivering Affordable Fault-tolerance to Commodity Computer Systems. 5865KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:27次