ETRI Journal | |
Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision | |
关键词: likelihood ratio test; soft decision; second-order conditional MAP; Voice activity detection; | |
Others : 1186400 DOI : 10.4218/etrij.12.0111.0344 |
|
【 摘 要 】
In this paper, we propose a novel approach to statistical model-based voice activity detection (VAD) that incorporates a second-order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first-order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice activity decisions in the second-order (previous two frames) CMAP, which has quadruple thresholds with an additional degree of freedom, rather than the first-order (previous single frame). Also, a soft-decision scheme is incorporated, resulting in time-varying thresholds for further performance improvement. Experimental results show that the proposed algorithm outperforms the conventional CMAP-based VAD technique under various experimental conditions.
【 授权许可】
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150520125309586.pdf | 428KB | download |
【 参考文献 】
- [1]J.W. Shin et al., "Voice Activity Detection Based on Conditional MAP Criterion," IEEE Signal Proc. Lett., vol. 15, Feb. 2008, pp. 257-260.
- [2]L.R. Rabiner and M.R. Sambur, "Voiced-Unvoiced-Silence Detection Using the Itakura LPC Distance Measure," Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., May 1977, pp. 323-326.
- [3]J.A. Haigh and J.S. Mason, "Robust Voice Activity Detection Using Cepstral Features," Proc. IEEE TENCON, vol. 3, Oct. 1993, pp. 321-324.
- [4]K. Srinivasant and A. Gersho, "Voice Activity Detection for Cellular Networks," Proc. IEEE Works. Speech Coding Telecommu., Oct. 1993, pp. 85-86.
- [5]Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., vol. ASSP-32, no. 6, Dec. 1984, pp. 1109-1121.
- [6]Y.D. Cho, K. Al-Naimi, and A. Kondoz, "Improved Voice Activity Detection Based on a Smoothed Statistical Likelihood Ratio," Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Process., vol. 2, May 2001, pp. 737-740.
- [7]J. Sohn, N.S. Kim, and W. Sung, "A Statistical Model-Based Voice Activity Detection," IEEE Signal Proc. Lett., vol. 6, no. 1, Jan. 1999, pp. 1-3.
- [8]J.-H. Chang, N.S. Kim, and S.K. Mitra, "Voice Activity Detection Based on Multiple Statistical Models," IEEE Trans. Signal Process., vol. 54, no. 6, June 2006, pp. 1965-1976.
- [9]J. Ramirez et al, "Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test," IEEE Signal Process. Lett., vol. 12, no. 10, Oct. 2005, pp. 689-692.
- [10]J.-H. Chang, J.W. Shin, and N.S. Kim, "Likelihood Ratio Test with Complex Laplacian Model for Voice Activity Detection," Proc. Eurospeech, Aug. 2003, pp. 1065-1068.
- [11]J.-H. Chang et al., "Global Soft Decision Employing Support Vector Machine for Speech Enhancement," IEEE Signal Proc. Lett., vol. 16, no. 1, Jan. 2009, pp. 57-60.
- [12]P.C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.
- [13]ITU-T, "A Silence Compression Scheme for G.729 Optimised for Terminals Conforming to Recommendation V.70," ITU-T Rec. G.729, Annex B, 1996.