期刊论文详细信息
International Journal of Advanced Robotic Systems
Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction
关键词: Audio-visual speech recognition;    Deep Dynamic Bayesian Network;    unsupervised feature learning;    Tibetan speech recognition;   
DOI  :  10.5772/54000
学科分类:自动化工程
来源: InTech
PDF
【 摘 要 】

Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201902187840838ZK.pdf 1160KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:16次