学位论文详细信息
Lipreading with convolutional and recurrent neural network models
Lipreading;Convolutional neural network
Zhu, Tianyilin ; Hasegawa-Johnson ; Mark
关键词: Lipreading;    Convolutional neural network;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/97763/ZHU-THESIS-2017.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Lip reading is the process of speech recognition from solely visual information. The goal of this thesis is to perform a silence vs. speech classification, and to recognize the triphone spoken by a talking head, given only the video using neural network classification models. Two neural network architectures are developed and tested on the AVICAR dataset, including one convolutional neural network (CNN) model with fully connected classification layer, and one recurrent neural network (RNN) model with convolutional layer and one long short-term memory (LSTM) layer to perform the classification on a sequence of input. In both models, the convolutional layers serve as feature extractors.The performance of each model is experimentally evaluated and the detailed network structure and preprocessing pipeline are demonstrated.

【 预 览 】
附件列表
Files Size Format View
Lipreading with convolutional and recurrent neural network models 1267KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:19次