期刊论文详细信息
ETRI Journal
Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems
关键词: speech recognition;    feature extraction;    phoneme attractor;    Reconstructed phase space;   
Others  :  1196796
DOI  :  10.4218/etrij.13.0112.0074
PDF
【 摘 要 】

In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

【 授权许可】

   

【 预 览 】
附件列表
Files Size Format View
20150521125652369.pdf 507KB PDF download
【 参考文献 】
  • [1]X. Liu, Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition, doctoral dissertation, Cambridge University Engineering Department, Cambridge, England, UK, 2005.
  • [2]Y. Tang and R. Rose, "A Study of Using Locality Preserving Projections for Feature Extraction in Speech Recognition," Proc. ICASSP, 2008, pp. 1569-1572.
  • [3]H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. America, vol. 87, no. 4, 1990, pp. 1738-1752.
  • [4]A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality Reduction Methods Applied to Both Magnitude and Phase Derived Features," Proc. Interspeech, 2007, pp. 1957-1960.
  • [5]I. Kokkinos and P. Maragos, "Nonlinear Speech Analysis Using Models for Chaotic Systems," IEEE Trans. Speech Audio Process., vol. 13, no. 6, 2005, pp. 1098-1109.
  • [6]J.J. Jiang, Y. Zhang, and C. McGilligan, "Chaos in Voice, from Modeling to Measurement," J. Voice, vol. 20, 2006, pp. 2-17.
  • [7]H. Whitney, "Differentiable Manifolds," Annals Math., 2nd series, vol. 37, 1936, pp. 645-680.
  • [8]F. Takens, "Detecting Strange Attractors in Turbulence," Proc. Dynamical Syst. Turbulence, 1980, pp. 366-381.
  • [9]H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge, England, UK: Cambridge University Press, 1997.
  • [10]A. Ezeiza et al., "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition," Proc. NOLISP, 2011, pp. 183-189.
  • [11]V. Pitsikalis, I. Kokkinos, and P. Maragos, "Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents," Proc. Eurospeech, 2003.
  • [12]S. Prasad et al., "Nonlinear Dynamical Invariants for Speech Recognition," Proc. Int. Conf. Spoken Language Process., 2006, pp. 2518-2521.
  • [13]S. Yu, D. Zheng, and X. Feng, "A New Time-Domain Feature Parameter for Phoneme Classification," Proc. WESPAC IX, 2006.
  • [14]M.T. Johnson et al., "Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces," IEEE Trans. Speech Audio Process., vol. 13, no. 4, 2005, pp. 458-466.
  • [15]R.J. Povinelli et al., "Statistical Models of Reconstructed Phase Spaces for Signal Classification," IEEE Trans. Signal Process., vol. 54, no. 6, 2006, pp. 2178-2186.
  • [16]A. Jafari, F. Almasganj, and M. NabiBidhendi, "Statistical Modeling of Speech Poincaré Sections in Combination of Frequency Analysis to Improve Speech Recognition Performance," Chaos, vol. 20, 2010, pp. 033106:1-11.
  • [17]J. Sun, N. Zheng, and X. Wang, "Enhancement of Chinese Speech Based on Nonlinear Dynamics," Signal Process., vol. 87, no. 1, 2007, pp. 2431-2445.
  • [18]Y. Shekofteh and F. Almasganj, "Using Phase Space Based Processing to Extract Proper Features for ASR Systems," Proc. 5th Int. Symp. Telecommun., 2010, pp. 596-599.
  • [19]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Speech Recognition Using Reconstructed Phase Space Features," Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2003, pp. 61-63.
  • [20]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Joint Frequency Domain and Reconstructed Phase Space Features for Speech Recognition," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2004, pp. 533-536.
  • [21]J. Ye, M.T. Johnson, and R.J. Povinelli, "Phoneme Classification over Reconstructed Phase Space Using Principal Component Analysis," Proc. NOLISP, 2003, pp. 11-16.
  • [22]FARSDAT (Farsi Speech Database). Available: http://catalog.elra.info/product_info.php?products_id=18
  • [23]S. Young et al., The HTK Book, Version 3.4, Cambridge University Engineering Department, Cambridge, England, UK, 2006. Available: http://htk.eng.cam.ac.uk
  • [24]Y. Shekofteh, F. Almasganj, and M.M. Goodarzi, "Comparison of Linear Based Feature Transformations to Improve Speech Recognition Performance," Proc. ICEE, 2011, pp. 1-4.
  • [25]C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, article 27.
  • [26]C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Netw., vol. 13, no. 2, 2002, pp. 415-425.
  • [27]F. Grezl and M. Karafiat, "Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture," Proc. Interspeech, 2011, pp. 1229-1232.
  文献评价指标  
  下载次数:7次 浏览次数:13次