科技报告详细信息
Bridging the Gap between Speech Production and Speech Recognition.
Hogden, J. E. ; Valdez, P. F.
Technical Information Center Oak Ridge Tennessee
关键词: Speech;    Signal processing;    World models;    Production;    Speech synthesizers;   
RP-ID  :  DE2001763401
学科分类:工程和技术(综合)
美国|英语
来源: National Technical Reports Library
PDF
【 摘 要 】

Although stochastic models of speech signals (e.g. hidden Markov models, trigrams, etc) have lead to impressive improvements in speech recognition accuracy, it has been noted that these models have little relationship to speech production (Lee, 1989) and their recognition performance on some important tasks is far from perfect. However, there have been recent attempts to bridge the gap between speech production and speech recognition using models that are stochastic and yet make more reasonable assumptions about the mechanisms underlying speech production (Bakis, 1991; Deng, 1998; Hogden, 1998; Picone et al., 1999). One of theses models, Multiple Observable, Maximum Likelihood Continuity Mapping (MO-MALCOM) is described in this paper. There are theoretical and experimental reasons to believe that MO-MALCOM learns an insertable stochastic mapping between articulator positions and speech acoustics. Furthermore, MO-MALCOM can be combined with standard speech recognition algorithms to create a speech recognition model based on a stochastic production model. Results of using MO-MALCOM speech recognition on data derived from the switchboard corpus will be discussed. (Jelinek, 1997). A nice feature of HMMs is that maximum likelihood techniques allow the model parameters to be automatically determined from training data. The automatic parameter estimation, and the stochastic nature of the HMMs are presumably the features that allow them to cope with the amazing amount of variability in speech.

【 预 览 】
附件列表
Files Size Format View
DE2001763401.pdf 411KB PDF download
  文献评价指标  
  下载次数:18次 浏览次数:19次