| IEEE Access | |
| Deep Learning-Based Automated Lip-Reading: A Survey | |
| Daqing Chen1  Souheil Fenghour1  Kun Guo2  Bo Li3  Perry Xiao4  | |
| [1] School of Engineering, London South Bank University, London, U.K.;Xi&x2019;an VANXUM Electronics Technology Company Ltd., Xi&x2019;an, China; | |
| 关键词: Visual speech recognition; lip-reading; deep learning; feature extraction; classification; computer vision; | |
| DOI : 10.1109/ACCESS.2021.3107946 | |
| 来源: DOAJ | |
【 摘 要 】
A survey on automated lip-reading approaches is presented in this paper with the main focus being on deep learning related methodologies which have proven to be more fruitful for both feature extraction and classification. This survey also provides comparisons of all the different components that make up automated lip-reading systems including the audio-visual databases, feature extraction, classification networks and classification schemas. The main contributions and unique insights of this survey are: 1) A comparison of Convolutional Neural Networks with other neural network architectures for feature extraction; 2) A critical review on the advantages of Attention-Transformers and Temporal Convolutional Networks to Recurrent Neural Networks for classification; 3) A comparison of different classification schemas used for lip-reading including ASCII characters, phonemes and visemes, and 4) A review of the most up-to-date lip-reading systems up until early 2021.
【 授权许可】
Unknown