学位论文详细信息
레지듀얼 심플 게이티드 콘보넷을 이용한 온-디바이스 음성인식
음성인식;시퀀스 모델링;RNN;CNN;임베디드 디바이스;621.3
공과대학 전기·정보공학부 ;
University:서울대학교 대학원
关键词: 음성인식;    시퀀스 모델링;    RNN;    CNN;    임베디드 디바이스;    621.3;   
Others  :  http://s-space.snu.ac.kr/bitstream/10371/161058/1/000000156245.pdf
美国|英语
来源: Seoul National University Open Repository
PDF
【 摘 要 】

Nowadays, many embedded devices, such as smartphones and Amazon Alexa, useautomatic speech recognition (ASR) technology for the hands-free interface. Especiallyneural network-based algorithms are widely employed in ASR because of highaccuracy and resiliency in noisy environments.Neural network-based algorithms require a large amount of computation for realtimeoperation. As a result, most of today’s ASR systems adopt server-based processing.However, privacy concerns and low latency bring increased demand for on-deviceASR. For on-device ASR, the power consumption should be minimized to increase theoperating time.Many neural network models have been developed for high-performance ASR.Among them, the recurrent neural network (RNN) based algorithms are most commonlyused for speech recognition. Especially long short-term memory (LSTM) RNNis very well known. However, executing the LSTM algorithm on an embedded deviceconsumes much power because the cache size is too small to accommodate all thenetwork parameters. Frequent DRAM accesses due to cache misses not only slow theexecution but also incur a lot of power consumption. One possible solution to mitigatethis problem is to compute multiple output samples at a time, which is called themulti-time step parallelization, to reduce the number of parameter fetches. However,the complex feedback structure of LSTM RNN does not allow multi-time step parallelprocessing.This thesis presents a Residual Simple Gated Convolutional Network (ResidualSimple Gated ConvNet) model with only about 1M parameters. Nowadays, manyCPUs can accommodate neural networks with a parameter size of 1M in cache memory. Thus, this model can run ASR very fast and efficiently without consuming muchpower. The developed model is also based on a convolutional neural network, thus themulti-time step processing can easily be applied. To achieve high accuracy with a smallnumber of parameters, the model employs one-dimensional depthwise convolution,which helps to find temporal patterns of the speech signal. We also considered inceptionresidual connections to reduce the needed number of layers, but this approachneeds to be more improved. The developed Residual Simple Gated ConvNet showedvery fairly high accuracy even with 1M parameters when trained on WSJ speech corpus.This model demands less than 10% of CPU time when running on ARM-basedCPUs for embedded devices.

【 预 览 】
附件列表
Files Size Format View
레지듀얼 심플 게이티드 콘보넷을 이용한 온-디바이스 음성인식 3889KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:11次