Rising PVT variations at advanced process nodes make it increasingly difficult to meet aggressive performance targets under strict power budgets. Traditional adaptive techniques that compensate for PVT variations need safety margins and cannot respond to rapid environmental changes. We present a novel voltage management technique, called Razor, which eliminates worst-case safety margins through in situ error detection and correction of variation-induced delay errors. In Razor, we use a delay-error tolerant flip-flop on critical paths to scale the supply voltage to the point of first failure of a die for a given frequency.Thus, all margins due to global and local PVT variations are eliminated, resulting in significant energy savings. In addition, the supply voltage can be scaled even lower than the first failure point into the sub-critical region, deliberately tolerating a targeted error rate, thereby providing additional energy savings. Thus, in the context of Razor, a timing error is not a catastrophic system failure but a trade-off between the overhead of error-correction and the additional energy savings due to sub-critical operation. In Razor, the error-rate is monitored and the supply voltage is tuned to achieve a targeted error-rate. We developed two techniques, called RazorI and RazorII, for implementation of Razor-based voltage tuning in microprocessors. The RazorI approach achieves error-detection by double-sampling the critical-path output at different points in time and comparing both samples. A global recovery signal overwrites the earlier, speculative sample with the later sample and restores the pipeline to its correct state. We implemented RazorI error-detection and correction in a 64bit processor in 0.18micron technology and obtained 50% energy savings over the worst-case at 120MHz. However, the efficacy of the RazorI technique for high-performance processors is undermined by its reliance on a metastability-detector and potentially, timing-critical pipeline recovery path. The RazorII approach addresses this issue by achieving recovery from delay-errors through a conventional, architectural-replay mechanism. Error-detection in RazorII occurs by flagging spurious transitions at critical-path endpoints. Furthermore, RazorII also detects logic and register SER. We implemented a RazorII-enabled 64bit processor in 0.13μm technology and obtained 33% power savings over the worst-case. SER tolerance was demonstrated with radiation experiments.
【 预 览 】
附件列表
Files
Size
Format
View
Razor:A Variability-Tolerant Design Methodology for Low-Power and Robust Computing.