会议论文详细信息
6th Symposium on Operating Systems Design & Implementation
MicrorebootA Technique for Cheap Recovery
George Candea ; Shinichi Kawamoto ; Yuichi Fujiki ; Greg Friedman ; Armando Fox
PID  :  75314
来源: CEUR
PDF
【 摘 要 】

A significant fraction of software failures in largescale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expensive, causing nontrivial service disruption or downtime even when clusters and failover are employed. In this work we use separation of process recovery from data recovery to enable microrebootinga finegrain tech nique for surgically recovering faulty application compo nents, without disturbing the rest of the application. We evaluate microrebooting in an Internet auction sys tem running on an application server. Microreboots re covermost of the same failures as full reboots, but do so an order of magnitude faster and result in an order of magni tude savings in lost work. This cheap form of recovery en genders a new approach to high availability: microreboots can be employed at the slightest hint of failure, prior to node failover in multinode clusters, even when mistakes in failure detection are likely; failure and recovery can be masked from end users through transparent calllevel re tries; and systems can be rejuvenated by parts, without

【 预 览 】
附件列表
Files Size Format View
MicrorebootA Technique for Cheap Recovery 746KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:3次