A significant fraction of software failures in largescale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expensive, causing nontrivial service disruption or downtime even when clusters and failover are employed. In this work we use separation of process recovery from data recovery to enable microrebootinga finegrain tech nique for surgically recovering faulty application compo nents, without disturbing the rest of the application. We evaluate microrebooting in an Internet auction sys tem running on an application server. Microreboots re covermost of the same failures as full reboots, but do so an order of magnitude faster and result in an order of magni tude savings in lost work. This cheap form of recovery en genders a new approach to high availability: microreboots can be employed at the slightest hint of failure, prior to node failover in multinode clusters, even when mistakes in failure detection are likely; failure and recovery can be masked from end users through transparent calllevel re tries; and systems can be rejuvenated by parts, without