The presence of acoustic cues and their importance in speech perception havelong remained debatable topics. In spite of several studies that exist in this eld, very little is known about what exactly humans perceive in speech. Thisresearch takes a novel approach towards understanding speech perception. Anew method, named three-dimensional deep search (3DDS), was developedto explore the perceptual cues of 16 consonant-vowel (CV) syllables, namely/pa/, /ta/, /ka/, /ba/, /da/, /ga/, /fa/, /Ta/, /sa/, /Sa/, /va/, /Da/, /za/,/Za/, from naturally produced speech. A veri cation experiment was thenconducted to further verify thendings of the 3DDS method. For this pur-pose, the time-frequency coordinate that de nes each CV wasltered outusing the short-time Fourier transform (STFT), and perceptual tests werethen conducted. A comparison between unmodi ed speech sounds and thosewithout the acoustic cues was made. In most of the cases, the scores droppedfrom 100% to chance levels even at 12 dB SNR. This clearly emphasizes theimportance of features in identifying each CV. The results con rm earlier ndings that stops are characterized by a short-duration burst preceding thevowel by 10 cs in the unvoiced case, and appearing almost coincidentwith the vowel in the voiced case. As has been previously hypothesized,we con rmed that the F2 transition plays no signi cant role in consonantidenti cation. 3DDS analysis labels the /sa/ and /za/ perceptual featuresas an intense frication noise around 4 kHz, preceding the vowel by 15{20cs, with the /za/ feature being around 5 cs shorter in duration than thatof /sa/; the /Sa/ and /Za/ events are found to be frication energy near 2kHz, preceding the vowel by 17{20 cs. /fa/ has a relatively weak burst andfrication energy over a wide-band including 2{6 kHz, while /va/ has a cuein the 1.5 kHz mid-frequency region preceding the vowel by 7{10 cs. Newinformation is established regarding /Da/ and /Ta/, especially with regardsto the nature of their signi cant confusions.
【 预 览 】
附件列表
Files
Size
Format
View
Verification of feature regions for stops and fricatives in natural speech