期刊论文详细信息
Bulletin of the Polish Academy of Sciences. Technical Sciences
Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition
M. KubanekInstitute of Computer and Information Sciences, Czestochowa University of Technology, 73 D?browskiego St., 42-200 Cz?stochowa, PolandOther articles by this author:De Gruyter OnlineGoogle Scholar1  J. BobulskiInstitute of Computer and Information Sciences, Czestochowa University of Technology, 73 D?browskiego St., 42-200 Cz?stochowa, PolandOther articles by this author:De Gruyter OnlineGoogle Scholar1  L. AdrjanowiczInstitute of Computer and Information Sciences, Czestochowa University of Technology, 73 D?browskiego St., 42-200 Cz?stochowa, PolandOther articles by this author:De Gruyter OnlineGoogle Scholar1 
[1] Institute of Computer and Information Sciences, Czestochowa University of Technology, 73 D?browskiego St., 42-200 Cz?stochowa, Poland
关键词: Keywords: : coupled hidden Markov models;    audio-visual speech recognition;    lip reading.;   
DOI  :  10.2478/v10175-012-0041-6
学科分类:工程和技术(综合)
来源: Polska Akademia Nauk * Centrum Upowszechniania Nauki / Polish Academy of Sciences, Center for the Advancement of Science
PDF
【 摘 要 】

This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201902182626791ZK.pdf 1138KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:1次