| ETRI Journal | |
| Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems | |
| 关键词: speech recognition; feature extraction; phoneme attractor; Reconstructed phase space; | |
| Others : 1196796 DOI : 10.4218/etrij.13.0112.0074 |
|
PDF
|
|
【 摘 要 】
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
【 授权许可】
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150521125652369.pdf | 507KB |
【 参考文献 】
- [1]X. Liu, Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition, doctoral dissertation, Cambridge University Engineering Department, Cambridge, England, UK, 2005.
- [2]Y. Tang and R. Rose, "A Study of Using Locality Preserving Projections for Feature Extraction in Speech Recognition," Proc. ICASSP, 2008, pp. 1569-1572.
- [3]H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. America, vol. 87, no. 4, 1990, pp. 1738-1752.
- [4]A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality Reduction Methods Applied to Both Magnitude and Phase Derived Features," Proc. Interspeech, 2007, pp. 1957-1960.
- [5]I. Kokkinos and P. Maragos, "Nonlinear Speech Analysis Using Models for Chaotic Systems," IEEE Trans. Speech Audio Process., vol. 13, no. 6, 2005, pp. 1098-1109.
- [6]J.J. Jiang, Y. Zhang, and C. McGilligan, "Chaos in Voice, from Modeling to Measurement," J. Voice, vol. 20, 2006, pp. 2-17.
- [7]H. Whitney, "Differentiable Manifolds," Annals Math., 2nd series, vol. 37, 1936, pp. 645-680.
- [8]F. Takens, "Detecting Strange Attractors in Turbulence," Proc. Dynamical Syst. Turbulence, 1980, pp. 366-381.
- [9]H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge, England, UK: Cambridge University Press, 1997.
- [10]A. Ezeiza et al., "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition," Proc. NOLISP, 2011, pp. 183-189.
- [11]V. Pitsikalis, I. Kokkinos, and P. Maragos, "Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents," Proc. Eurospeech, 2003.
- [12]S. Prasad et al., "Nonlinear Dynamical Invariants for Speech Recognition," Proc. Int. Conf. Spoken Language Process., 2006, pp. 2518-2521.
- [13]S. Yu, D. Zheng, and X. Feng, "A New Time-Domain Feature Parameter for Phoneme Classification," Proc. WESPAC IX, 2006.
- [14]M.T. Johnson et al., "Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces," IEEE Trans. Speech Audio Process., vol. 13, no. 4, 2005, pp. 458-466.
- [15]R.J. Povinelli et al., "Statistical Models of Reconstructed Phase Spaces for Signal Classification," IEEE Trans. Signal Process., vol. 54, no. 6, 2006, pp. 2178-2186.
- [16]A. Jafari, F. Almasganj, and M. NabiBidhendi, "Statistical Modeling of Speech Poincaré Sections in Combination of Frequency Analysis to Improve Speech Recognition Performance," Chaos, vol. 20, 2010, pp. 033106:1-11.
- [17]J. Sun, N. Zheng, and X. Wang, "Enhancement of Chinese Speech Based on Nonlinear Dynamics," Signal Process., vol. 87, no. 1, 2007, pp. 2431-2445.
- [18]Y. Shekofteh and F. Almasganj, "Using Phase Space Based Processing to Extract Proper Features for ASR Systems," Proc. 5th Int. Symp. Telecommun., 2010, pp. 596-599.
- [19]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Speech Recognition Using Reconstructed Phase Space Features," Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2003, pp. 61-63.
- [20]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Joint Frequency Domain and Reconstructed Phase Space Features for Speech Recognition," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2004, pp. 533-536.
- [21]J. Ye, M.T. Johnson, and R.J. Povinelli, "Phoneme Classification over Reconstructed Phase Space Using Principal Component Analysis," Proc. NOLISP, 2003, pp. 11-16.
- [22]FARSDAT (Farsi Speech Database). Available: http://catalog.elra.info/product_info.php?products_id=18
- [23]S. Young et al., The HTK Book, Version 3.4, Cambridge University Engineering Department, Cambridge, England, UK, 2006. Available: http://htk.eng.cam.ac.uk
- [24]Y. Shekofteh, F. Almasganj, and M.M. Goodarzi, "Comparison of Linear Based Feature Transformations to Improve Speech Recognition Performance," Proc. ICEE, 2011, pp. 1-4.
- [25]C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, article 27.
- [26]C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Netw., vol. 13, no. 2, 2002, pp. 415-425.
- [27]F. Grezl and M. Karafiat, "Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture," Proc. Interspeech, 2011, pp. 1229-1232.
PDF