学位论文详细信息
Multisensor Segmentation-based Noise Suppression for Intelligibility Improvement in MELP Coders
Speech intelligibility;Speech quality;GEMS;Multi-sensor;Automatic speech recognition;Speech enhancement;Segmentation-based enhancement;Noise-robust automatic segmentation;Comb filter;Data marginalization;Data fusion;Missing data
Demiroglu, Cenk ; Electrical and Computer Engineering
University:Georgia Institute of Technology
Department:Electrical and Computer Engineering
关键词: Speech intelligibility;    Speech quality;    GEMS;    Multi-sensor;    Automatic speech recognition;    Speech enhancement;    Segmentation-based enhancement;    Noise-robust automatic segmentation;    Comb filter;    Data marginalization;    Data fusion;    Missing data;   
Others  :  https://smartech.gatech.edu/bitstream/1853/10455/1/demiroglu_cenk_200605_phd.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

This thesis investigates the use of an auxiliary sensor, the GEMS device, for improving the quality of noisy speech and designing noise preprocessors to MELP speech coders. Use of auxiliary sensors for noise-robust ASR applications is also investigated to develop speech enhancement algorithms that use acoustic-phonetic properties of the speech signal. A Bayesian risk minimization framework is developed that can incorporate the acoustic-phonetic propertiesof speech sounds and knowledge of human auditory perception into the speech enhancement framework. Two noise suppressionsystems are presented using the ideas developed in the mathematical framework. In the first system, an aharmoniccomb filter is proposed for voiced speech where low-energy frequencies are severely suppressed whilehigh-energy frequencies are suppressed mildly. The proposedsystem outperformed an MMSE estimator in subjective listening tests and DRT intelligibility test for MELP-coded noisy speech. The effect of aharmoniccomb filtering on the linear predictive coding (LPC) parameters is analyzed using a missing data approach.Suppressing the low-energy frequencies without any modification of the high-energy frequencies is shown toimprove the LPC spectrum using the Itakura-Saito distance measure.The second system combines the aharmonic comb filter with the acoustic-phonetic properties of speech to improve the intelligibility of the MELP-coded noisy speech. Noisy speech signal is segmented into broad level sound classes using a multi-sensor automaticsegmentation/classification tool, and each sound class is enhanced differently based on itsacoustic-phonetic properties. The proposed system is shown to outperform both the MELPe noise preprocessorand the aharmonic comb filter in intelligibility tests when used in concatenation with the MELP coder.Since the second noise suppression system uses an automatic segmentation/classification algorithm, exploiting the GEMS signal in an automaticsegmentation/classification task is also addressed using an ASRapproach. Current ASR engines can segment and classify speech utterances in a single pass; however, they are sensitive to ambient noise. Features that are extracted from the GEMS signal can be fused with the noisy MFCC featuresto improve the noise-robustness of the ASR system. In the first phase, a voicingfeature is extracted from the clean speech signal and fused with the MFCC features. The actual GEMS signal could not be used in this phase because of insufficient sensor data to train the ASR system. Tests are done using the Aurora2 noisy digits database. The speech-based voicingfeature is found to be effective at around 10 dB but, below 10 dB, the effectiveness rapidly drops with decreasing SNR because of the severe distortions in the speech-based features at these SNRs. Hence, a novel system is proposed that treats theMFCC features in a speech frame as missing data if the global SNR is below 10 dB and the speech frame isunvoiced. If the global SNR is above 10 dB of the speech frame is voiced, both MFCC features and voicing feature are used. The proposedsystem is shown to outperform some of the popular noise-robust techniques at all SNRs.In the second phase, a new isolated monosyllable database is prepared that contains both speech and GEMS data. ASR experiments conductedfor clean speech showed that the GEMS-based feature, when fused with the MFCC features, decreases the performance.The reason for this unexpected result is found to be partly related to some of the GEMS data that is severely noisy. The non-acoustic sensor noise exists in all GEMS data but the severe noise happens rarely. A missingdata technique is proposed to alleviate the effects of severely noisy sensor data. The GEMS-based feature is treated as missing datawhen it is detected to be severely noisy. The combined features are shown to outperform the MFCC features for clean speech when the missing data technique is applied.

【 预 览 】
附件列表
Files Size Format View
Multisensor Segmentation-based Noise Suppression for Intelligibility Improvement in MELP Coders 1145KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:73次