Applied Sciences | |
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition | |
Henry Han1  Huiyun Zhang2  Heming Huang2  | |
[1] Department of Computer Science, School of Engineering and Computer Science, Baylor University, One Bear Place 97141, Waco, TX 76798, USA;School of Computer Science, Qinghai Normal University, Xining 810008, China; | |
关键词: speech emotion recognition; feature extraction; heterogeneous parallel network; spectral features; prosodic features; multi-feature fusion; | |
DOI : 10.3390/app11219897 | |
来源: DOAJ |
【 摘 要 】
Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.
【 授权许可】
Unknown