| CAAI Transactions on Intelligence Technology | |
| Developing phoneme-based lip-reading sentences system for silent speech recognition | |
| article | |
| Randa El-Bialy1  Daqing Chen1  Souheil Fenghour1  Walid Hussein2  Perry Xiao1  Omar H. Karam2  Bo Li3  | |
| [1] School of Engineering, London South Bank University;Faculty of Informatics and Computer Science, British University in Egypt;School of Electronics and Informatics, Northwestern Polytechnical University | |
| 关键词: deep learning; deep neural networks; lip-reading; phoneme-based lip-reading; spatial-temporal convolution; transformers; | |
| DOI : 10.1049/cit2.12131 | |
| 学科分类:数学(综合) | |
| 来源: Wiley | |
PDF
|
|
【 摘 要 】
Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. The visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilise multi-headed attention for phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip-reading sentences, the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios.
【 授权许可】
CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307080005183ZK.pdf | 961KB |
PDF