期刊论文

【摘要】

In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
20150521125652369.pdf	507KB	PDF	download

【参考文献】

[1]X. Liu, Discriminative Complexity Control and Linear Projections for Large Vocabulary Speech Recognition, doctoral dissertation, Cambridge University Engineering Department, Cambridge, England, UK, 2005.
[2]Y. Tang and R. Rose, "A Study of Using Locality Preserving Projections for Feature Extraction in Speech Recognition," Proc. ICASSP, 2008, pp. 1569-1572.
[3]H. Hermansky, "Perceptual Linear Predictive (PLP) Analysis of Speech," J. Acoustical Soc. America, vol. 87, no. 4, 1990, pp. 1738-1752.
[4]A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality Reduction Methods Applied to Both Magnitude and Phase Derived Features," Proc. Interspeech, 2007, pp. 1957-1960.
[5]I. Kokkinos and P. Maragos, "Nonlinear Speech Analysis Using Models for Chaotic Systems," IEEE Trans. Speech Audio Process., vol. 13, no. 6, 2005, pp. 1098-1109.
[6]J.J. Jiang, Y. Zhang, and C. McGilligan, "Chaos in Voice, from Modeling to Measurement," J. Voice, vol. 20, 2006, pp. 2-17.
[7]H. Whitney, "Differentiable Manifolds," Annals Math., 2nd series, vol. 37, 1936, pp. 645-680.
[8]F. Takens, "Detecting Strange Attractors in Turbulence," Proc. Dynamical Syst. Turbulence, 1980, pp. 366-381.
[9]H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge, England, UK: Cambridge University Press, 1997.
[10]A. Ezeiza et al., "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition," Proc. NOLISP, 2011, pp. 183-189.
[11]V. Pitsikalis, I. Kokkinos, and P. Maragos, "Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents," Proc. Eurospeech, 2003.
[12]S. Prasad et al., "Nonlinear Dynamical Invariants for Speech Recognition," Proc. Int. Conf. Spoken Language Process., 2006, pp. 2518-2521.
[13]S. Yu, D. Zheng, and X. Feng, "A New Time-Domain Feature Parameter for Phoneme Classification," Proc. WESPAC IX, 2006.
[14]M.T. Johnson et al., "Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces," IEEE Trans. Speech Audio Process., vol. 13, no. 4, 2005, pp. 458-466.
[15]R.J. Povinelli et al., "Statistical Models of Reconstructed Phase Spaces for Signal Classification," IEEE Trans. Signal Process., vol. 54, no. 6, 2006, pp. 2178-2186.
[16]A. Jafari, F. Almasganj, and M. NabiBidhendi, "Statistical Modeling of Speech Poincaré Sections in Combination of Frequency Analysis to Improve Speech Recognition Performance," Chaos, vol. 20, 2010, pp. 033106:1-11.
[17]J. Sun, N. Zheng, and X. Wang, "Enhancement of Chinese Speech Based on Nonlinear Dynamics," Signal Process., vol. 87, no. 1, 2007, pp. 2431-2445.
[18]Y. Shekofteh and F. Almasganj, "Using Phase Space Based Processing to Extract Proper Features for ASR Systems," Proc. 5th Int. Symp. Telecommun., 2010, pp. 596-599.
[19]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Speech Recognition Using Reconstructed Phase Space Features," Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2003, pp. 61-63.
[20]A.C. Lindgren, M.T. Johnson, and R.J. Povinelli, "Joint Frequency Domain and Reconstructed Phase Space Features for Speech Recognition," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2004, pp. 533-536.
[21]J. Ye, M.T. Johnson, and R.J. Povinelli, "Phoneme Classification over Reconstructed Phase Space Using Principal Component Analysis," Proc. NOLISP, 2003, pp. 11-16.
[22]FARSDAT (Farsi Speech Database). Available: http://catalog.elra.info/product_info.php?products_id=18
[23]S. Young et al., The HTK Book, Version 3.4, Cambridge University Engineering Department, Cambridge, England, UK, 2006. Available: http://htk.eng.cam.ac.uk
[24]Y. Shekofteh, F. Almasganj, and M.M. Goodarzi, "Comparison of Linear Based Feature Transformations to Improve Speech Recognition Performance," Proc. ICEE, 2011, pp. 1-4.
[25]C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, article 27.
[26]C.W. Hsu and C.J. Lin, "A Comparison of Methods for Multiclass Support Vector Machines," IEEE Trans. Neural Netw., vol. 13, no. 2, 2002, pp. 415-425.
[27]F. Grezl and M. Karafiat, "Integrating Recent MLP Feature Extraction Techniques into TRAP Architecture," Proc. Interspeech, 2011, pp. 1229-1232.

ETRI Journal
Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems


关键词: speech recognition; feature extraction; phoneme attractor; Reconstructed phase space;
Others : 1196796 DOI : 10.4218/etrij.13.0112.0074

PDF


	文献评价指标
	下载次数：7次	浏览次数：13次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】