ETRI Journal | |
Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition | |
关键词: speech recognition; Gaussian mixture model; clean spectrum reconstruction; two-stage mel-warped Wiener filter; ETSI standard Aurora advanced front-end; Speech enhancement; | |
Others : 1185903 DOI : 10.4218/etrij.10.1510.0024 |
|
【 摘 要 】
This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.
【 授权许可】
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150520115551649.pdf | 731KB | download |
【 参考文献 】
- [1]Y. Gong, "Speech Recognition in Noisy Environments: a Survey," Speech Commun., vol. 16, no. 3, Apr. 1995, pp. 261-291.
- [2]Y. Suh and H. Kim, "Feature Compensation Combining SNR-Dependent Feature Reconstruction and Class Histogram Equalization," ETRI J., vol. 30, no. 5, Oct. 2008, pp. 753-755.
- [3]J. Lim and A. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech," Proc. IEEE, vol. 67, no. 12, Dec. 1979, pp. 1586-1604.
- [4]ETSI Std. Document, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm," ETSI ES 202 050 V1.1.1 (2002-10).
- [5]A. Agarwal and Y. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," Proc. IEEE-ASRU Workshop, 1999, pp. 12-15.
- [6]M. Cheng et al., "A Robust Front-End Algorithm for Distributed Speech Recognition," Proc. EUROSPEECH, 2001, pp. 425-428.
- [7]D. Macho et al., "Evaluation of a Noise-Robust DSR Front-End on Aurora Databases," Proc. ICSLP, Sept. 2002, pp. 17-20.
- [8]S. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans., Acoustics, Speech, Signal Process., vol. 27, no. 2, Apr. 1979, pp. 113-120.
- [9]Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121.
- [10]Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. 33, no. 2, Apr. 1985, pp. 443-445.
- [11]W. Wu and P. Chen, "Subband Kalman Filtering for Speech Enhancement," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., vol. 45, no. 8, Aug. 1998, pp. 1072-1083.
- [12]J. Gibson, B. Koo, and S. Gray, "Filtering of Colored Noise for Speech Enhancement and Coding," IEEE Trans. Signal Process., vol. 39, no. 8, Aug. 1991, pp. 1732-1742.
- [13]N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. Speech Audio Process., vol. 7, no. 2, Mar. 1999, pp. 126-137.
- [14]Y. Ephraim, "Statistical-Model-Based Speech Enhancement Systems," Proc. IEEE, vol. 80, no. 10, Oct. 1992, pp. 1526-1555.
- [15]H. Sameti et al., "HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise," IEEE Trans. Speech Audio Process., vol. 6, Sept. 1998, pp. 445-455.
- [16]J. Wu et al., "A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise," Proc. IEEE-ASRU Workshop, 2003, pp. 321-326.
- [17]T. Arakawa, M. Tsujikawa, and R. Isotani, "Model-Based Wiener Filter for Noise Robust Speech Recognition," Proc. ICASSP, 2006, pp. 537-540.
- [18]N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley: NY, 1949.
- [19]A. Kain and M. Macon, "Spectral Voice Conversion for Text-To-Speech Synthesis," Proc. ICASSP, 1998, pp. 285-288.
- [20]K. Park and H.S. Kim, "Narrowband to Wideband Conversion of Speech using GMM based Transformation," Proc. ICASSP, vol. 3, June 2000, pp. 1843-1846.
- [21]B. Kang, H. Jung, and Y. Lee, "Discriminative Noise Adaptive Training Approach for an Environment Migration," Proc. INTERSPEECH, Aug. 2007, pp. 2085-2089.
- [22]H. Jung, B. Kang, and Y. Lee, "Model Adaptation using Discriminative Noise Adaptive Approach for New Environments," ETRI J., vol. 30, no. 6, Dec. 2008, pp. 865-867.
- [23]S. Lee et al., "A Commercial Car Navigation System Using Korean Large Vocabulary Automatic Speech Recognizer," Proc. APSIPA ASC, Oct. 2009, pp. 286-289.