期刊论文

【摘要】

Although automatic speech recognition (ASR) technology is mature, there are still some unsolved problems, such as how to accurately identify what the speaker is saying in a noisy environment. Lipreading is a visual speech recognition technology that recognizes the speech content based on the motion characteristics of the speaker's lips without speech signals. Therefore, lipreading can detect the speaker's content in a noisy environment, even without a voice signal. This article summarizes the main research from traditional methods to deep learning methods on lipreading. Traditional lipreading methods are mainly discussed from three aspects: lip detection and extraction, lip feature extraction, and classification. Traditional feature extraction methods focus on handmade features, which are, however, not very reliable under unconstrained conditions. In recent years, traditional lipreading methods have been gradually replaced by deep learning methods. The advantage of deep learning methods is that they can learn the best features from large databases. This article analyzes typical deep learning methods in detail according to their structural characteristics, and lists existing lipreading databases, including their detailed information and the methods applied to these databases. Finally, the problems and challenges of current lipreading methods are discussed, and the future research direction has prospected.

【授权许可】

Unknown

IEEE Access
A Survey of Research on Lipreading Technology

Mingfeng Hao¹ Alimjan Aysa² Nurbiya Yadikar³ Kurban Ubul⁴ Mutallip Mamut⁴
[1] School of Information Science and Engineering, Xinjiang University, &x00DC;The Library of Xinjiang University, &x00DC;mqi, China;r&x00FC;
关键词: Visual speech recognition; lipreading; deep learning; feature extraction;
DOI : 10.1109/ACCESS.2020.3036865
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：0次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】