The major objective of this research is to make the processing-in-memory (PIM) based deep learning accelerator more practical and more computing efficient. This research particularly focuses on the emerging non-volatile memory (NVM) based novel architecture design and leverages the software-hardware co-optimization to achieve the optimal computing efficiency without compromising the accuracy. From the emerging memory perspective, this research mainly explores resistive ram (ReRAM) and Ferroelectrical FET (FeFet). A dedicated recurrent neural network (RNN) accelerator is proposed which utilizes ReRAM as the basic computation cell for vector matrix multiplication (VMM). The execution pipeline is specifically optimized to ensure the efficiency for RNN computation. Regarding the challenges stemmed from ReRAM, this research also explores FeFET to replace ReRAM as the basic memory cell in PIM architecture. A dedicated data communication network, named hierarchical network-on-chip (H-NoC), is presented to enhance the data transmission efficiency. To eliminate the power/area hungry analog-digital conversion (ADC and DAC) in existing PIM architecture and further enhance the efficiency, this research proposes an all-digital, flexible precision PIM design where the computation is performed with dynamical bit-precision. Besides the circuit and architecture optimization, algorithms are developed to fully utilize the hardware potentials. This research proposes a genetic algorithm (GA) based evolutionary method for layer-wise DNN quantization. DNN models can be dynamically quantized and deployed on the developed hardware platforms which support flexible bit-precision to achieve the best computing efficiency without compromising the accuracy. To alleviate the accuracy drop caused by the device (such as ReRAM and FeFET) variation, this research proposes hardware noise aware training algorithm, leading to a reliable PIM engine with un-reliable device.
【 预 览 】
附件列表
Files
Size
Format
View
Energy efficient processing in memory architecture for deep learning computing acceleration