Targeting convolutional neural networks (CNNs), we adopt the high level synthesis (HLS) design methodology and explore various optimization and synthesis techniques to optimize design on an FPGA. Our motivation is to target embedded devices that operate as edge devices. Recently, as machine learning algorithms have become more practical, there have been much effort to implement them on devices that can be used in our daily lives. However, unlike server devices, edge devices are relatively small and thus have much more limited resources and performance. Therefore, control of resource usage and optimization play an important role when we want to implement machine learning algorithms on an edge device. The key idea explored in this thesis is backward pipeline scheduling which optimizes the pipeline between CNN layers. This optimization technique is especially useful to utilize the limited on-chip memory resource for classifying an image on an edge device. We have achieved latency of 175.7 μs for classifying one image in the MNIST data set using the LeNet and 653.5μs for classifying one image in the Cifar-10 data set using the CifarNet. For the LeNet we were able to maintain high accuracy of 97.6% for the MNIST data set and 83.4% for the Cifar-10 data set.We achieved the best single-image latency, 5.2x faster for the LeNet and 1.95x faster for the CifarNet, compared with NVIDIA Jetson TX1.
【 预 览 】
附件列表
Files
Size
Format
View
Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices