| Acoustical science and technology | |
| Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech | |
| Zhi Zhu1  Ryota Miyauchi1  Masashi Unoki1  Yukiko Araki2  | |
| [1] Japan Advanced Institute of Science and Technology;Kanazawa University | |
| 关键词: Temporal cue; Speaker individuality; Vocal emotion; Noise-vocoded speech; Speech perception; | |
| DOI : 10.1250/ast.39.234 | |
| 学科分类:声学和超声波 | |
| 来源: Acoustical Society of Japan | |
PDF
|
|
【 摘 要 】
This paper investigates the importance of temporal cues in the perception of speaker individuality and vocal emotion. Experiments of speaker and vocal-emotion recognition were carried out using an analysis/synthesis method of noise-vocoded speech (NVS). The temporal resolution of NVS was controlled by varying the upper limit of modulation frequency (0, 0.5, 1, 2, 4, 8, 16, 32, and 64 Hz). In addition, the role of temporal cue in the different spectral resolution condition was also investigated by varying the number of channels (4, 8, and 16). The results demonstrated that temporal resolution contributes to the recognition of both speaker and vocal emotion. Therefore, temporal cues are found to be important for the perception of not only linguistic information but also speaker individuality and vocal emotion. On the other hand, the performance of speaker recognition was less sensitive to the spectral resolution, at least in the limited set of stimuli in the present study. For vocal-emotion recognition, the spectral resolution was shown to be important for recognizing only neutral, joy, and cold anger, but not sadness or hot anger. The important modulation frequency band for the perception of nonlinguistic information was suggested to be higher than that of linguistic information.
【 授权许可】
Unknown
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO201910189233823ZK.pdf | 492KB |
PDF