期刊论文详细信息
Applied Sciences
A Novel Heterogeneous Parallel Convolution Bi-LSTM for Speech Emotion Recognition
Henry Han1  Huiyun Zhang2  Heming Huang2 
[1] Department of Computer Science, School of Engineering and Computer Science, Baylor University, One Bear Place 97141, Waco, TX 76798, USA;School of Computer Science, Qinghai Normal University, Xining 810008, China;
关键词: speech emotion recognition;    feature extraction;    heterogeneous parallel network;    spectral features;    prosodic features;    multi-feature fusion;   
DOI  :  10.3390/app11219897
来源: DOAJ
【 摘 要 】

Speech emotion recognition is a substantial component of natural language processing (NLP). It has strict requirements for the effectiveness of feature extraction and that of the acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address the challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recalls on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次