期刊论文详细信息
CAAI Transactions on Intelligence Technology
Developing phoneme-based lip-reading sentences system for silent speech recognition
article
Randa El-Bialy1  Daqing Chen1  Souheil Fenghour1  Walid Hussein2  Perry Xiao1  Omar H. Karam2  Bo Li3 
[1] School of Engineering, London South Bank University;Faculty of Informatics and Computer Science, British University in Egypt;School of Electronics and Informatics, Northwestern Polytechnical University
关键词: deep learning;    deep neural networks;    lip-reading;    phoneme-based lip-reading;    spatial-temporal convolution;    transformers;   
DOI  :  10.1049/cit2.12131
学科分类:数学(综合)
来源: Wiley
PDF
【 摘 要 】

Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. The visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilise multi-headed attention for phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip-reading sentences, the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios.

【 授权许可】

CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO202307080005183ZK.pdf 961KB PDF download
  文献评价指标  
  下载次数:21次 浏览次数:1次