The rapid development of machine learning plays a key role in enabling next generation computing systems with enhanced intelligence. Present day machine learning systems adopt an "intelligence in the cloud" paradigm, resulting in heavy energy cost despite state-of-the-art performance. It is therefore of great interest to design embedded ultra-low power (ULP) platforms with in-silicon machine learning capability. A self-contained ULP platform consists of the energy delivery, sensing and information processing subsystems.This dissertation proposes techniques to design and optimize the ULP platform for in-silicon machine learning by exploring a trade-off that exists between energy-efficiency and robustness. This trade-off arises when the information processing functionality is integrated into the energy delivery,sensing, or emerging stochastic fabrics (e.g., CMOS operating in near-threshold voltage or voltage overscaling, and beyond CMOS devices).This dissertation presents the Compute VRM (C-VRM) to embed the information processing into the energy delivery subsystem. The C-VRM employs multiple voltage domain stacking and core swapping to achieve high total system energy efficiency in near/sub-threshold region. A prototype IC of the C-VRM is implemented in a 1.2 V, 130 nm CMOS process. Measured results indicate that the C-VRM has up to 44.8% savings in system-level energy per operation compared to the conventional system, and an efficiency ranging from 79% to 83% over an output voltage range of 0.52 V to 0.6 V. This dissertation further proposes the Compute Sensor approach to embed information processing into the sensing subsystem. The Compute Sensor eliminates both the traditional sensor-processor interface, and the high-SNR/high-energy digital processing by moving feature extraction and classification functions into the analog domain. Simulation results in 65 nm CMOS show that the proposed Compute Sensor can achieve a detection accuracy greater than 94.7% using the Caltech101 dataset, which is within 0.5% of that achieved by an ideal digital implementation. The performance is achieved with 7x to 17x lower energy than the conventional architecture for the same level of accuracy. To further explore the energy-efficiency vs. robustness trade-off, this dissertation explores the use of highly energy efficient but unreliable stochastic fabrics to implement in-silicon machine learning kernels. In order to perform reliable computation on the stochastic fabrics, this dissertation proposes to employ statistical error compensation (SEC) as an effective error compensation technique. This dissertation makes a contribution to the portfolio of SEC by proposing embedded algorithmic noise tolerance (E-ANT) for low overhead error compensation. E-ANT operates by reusing part of the main block as estimator and thus embedding the estimator into the main block. System level simulation results in a commercial 45 nm CMOS process show that E-ANT achieves up to 38% error tolerance and up to 51% energy savings compared with an uncompensated system.This dissertation makes a contribution to the theoretical understanding of stochastic fabrics by proposing a class of probabilistic error models that can accurately model the hardware errors on the stochastic fabrics. The models are validated in a commercial 45 nm CMOS process and employed to evaluate the performance of machine learning kernels in the presence of hardware errors. Performance prediction of a support vector machine (SVM) based classifier using these models indicates that the probability of detection P_{det} estimated using the proposed model is within 3%for timing errors due to voltage overscaling when the error rate p_η ≤ 80%, within 5% for timing errors due to process variation in near threshold-voltage (NTV) region (0.3 V-0.7 V) and within 2% for defect errors when the defect rate p_{saf} is between 10^{-3} and 20%, compared with HDL simulation results.Employing the proposed error model and evaluation methodology, this dissertation explores the use of distributed machine learning architectures, named classifier ensemble, to enhance the robustness of in-silicon machine learning kernels. Comparative study of distributed architectures (i.e., random forest (RF)) and centralized architectures (i.e., SVM) is performed in a commercial 45 nm CMOS process. Employing the UCI machine learning repository as input, it is determined that RF-based architectures are significantly more robust than SVM architectures in presence of timing errors in the NTV region (0.3 V- 0.7 V). Additionally, an error weighted voting technique that incorporates the timing error statistics of the NTV circuit fabric is proposed to further enhance the robustness of RF architectures. Simulation results confirm that the error weighted voting technique achieves a P_{det} that varies by only 1.4%, which is 12x lower compared to centralized architectures.
【 预 览 】
附件列表
Files
Size
Format
View
Design of robust ultra-low power platform for in-silicon machine learning