学位论文

【摘要】

Gaussian Mixture Model-Hidden Markov Models (GMM-HMMs) are the state-of-the-art for acoustic modeling in speech recognition. HMMs are used to model the sequential structure and the temporal variability in speech signals. However, GMMs are used to model the local spectral variability in the sound wave at each HMM state.Attempts to use Artificial Neural Networks (ANNs) to substitute GMMs in HMM-based acoustic modelsled to dismal results for many years. In fact,ANNs could not significantly outperform GMMs due to theirshallow architectures. In addition, it was difficult to train networks with many hidden layers on large amount of data using the back-propagation learning algorithm.In recent years, with the establishment of deep learning technique, ANNs with many hidden layers have been reintroduced as an alternative to GMMs inacoustic modeling, and have shown successful results.The deep learning technique consists of a two-phase procedure. First, the ANN is generatively pre-trained usingan unsupervised learning algorithm. Then, it is discriminatively fine-tuned using theback-propagation learning algorithm. The generative pre-training intends to initialize the weights of the network for better generalization performanceduring the discriminative phase.Combining Deep Neural Networks (DNNs) and HMMs within a single hybrid architecture for acoustic modeling have shown promising results in many speech recognition tasks.This thesis aims to empirically confirm the capability of DNNs to outperform GMMs in acoustic modeling.It also provides a systematic procedure to implement DNN-HMM acoustic models for phoneme recognition, including the implementation of a GMM-HMM baseline system.This thesis starts by providing a thorough overview of the fundamentals and background of speech recognition. The thesis then discusses DNN architecture and learning technique. In addition, the problems of GMMs and the advantages of DNNs in acoustic modeling are discussed.Finally, DNN-HMM hybridacoustic modes for phoneme recognition are implemented. The deployed DNN isgeneratively pre-trained and fine-tuned to produce a posterior distribution over the states ofmono-phone HMMs.The developed DNN-HMM phoneme recognition systemoutperform the GMM-HMM baseline on the TIMIT core test set.An in-depth investigation into the major factors behind the success of DNNsis carried out.

【预览】

附件列表
Files	Size	Format	View
Implementation of DNN-HMM Acoustic Models for Phoneme Recognition	961KB	PDF	download


Implementation of DNN-HMM Acoustic Models for Phoneme Recognition
Deep Neural Network;Acoustic Model;Automatic Speech Recognition;Phoneme Recognition;GMM-HMM;DNN;ASR;Electrical and Computer Engineering
Romdhani, Sihem
University of Waterloo
关键词: Deep Neural Network; Acoustic Model; Automatic Speech Recognition; Phoneme Recognition; GMM-HMM; DNN; ASR; Electrical and Computer Engineering;
Others : https://uwspace.uwaterloo.ca/bitstream/10012/9061/1/Romdhani_Sihem.pdf
瑞士\|英语
来源: UWSPACE Waterloo Institutional Repository
PDF


	文献评价指标
	下载次数：27次	浏览次数：36次

【 摘 要 】

【 预 览 】

【摘要】

【预览】