EURASIP Journal on Audio, Speech, and Music Processing | |
Attention mechanism combined with residual recurrent neural network for sound event detection and localization | |
Empirical Research | |
Lei Zhang1  Yulan Han2  Yuanyuan Zhang2  Chaofeng Lan2  Chao Sun2  Lirong Fu3  Meng Zhang4  | |
[1] Beidahuang Industry Group General Hospital, 150088, Harbin, People’s Republic of China;Department of School of Measurement and Communication Engineering, Harbin University of Science and Technology, 150080, Harbin, People’s Republic of China;Mechanical and Electrical Engineering College, Hainan University, 570228, Haikou, People’s Republic of China;School of Electronics and Communication Engineering, Guangzhou University, 510006, Guangzhou, People’s Republic of China; | |
关键词: Sound event; Detection and localization; Convolutional cyclic neural network; Multi-scale feature fusion; Space channel squeeze excitation module; | |
DOI : 10.1186/s13636-022-00263-6 | |
received in 2022-03-19, accepted in 2022-11-16, 发布年份 2022 | |
来源: Springer | |
【 摘 要 】
In the task of sound event detection and localization (SEDL) in a complex environment, the acoustic signals of different events usually have nonlinear superposition, so the detection and localization effect is not good. Given this, this paper is based on the Residual-spatially and channel Squeeze-Excitation (Res-scSE) model. Combined with Multiple-scale Convolutional Recurrent Neural Network (M-CRNN), the Res-scSE-CRNN model is proposed. Firstly, to solve the problem of insufficient extraction of time-frequency feature in single-size convolution kernel, multi-scale feature fusion is carried out by using the feature hierarchy of the convolutional neural network to improve the accuracy of detection. Secondly, aiming at the problem of overlapping audio event localization accuracy is not high, with Res-scSE to replace common convolution module and add residual structure to strengthen the feature extraction, and combining with an attention mechanism to enhance neural network channels and spatial relationships, to improve the network to extract the characteristics of directivity, achieve the goal of the overlapped audio localization. In this paper, experiments are carried out in the open dataset DCASE2019, and evaluation indicators are used to analyze the effectiveness of the proposed model and baseline model in the detection and localization of audio events. The results show that compared with the M-CRNN model, the detection error rate of Res-scSE-CRNN model is reduced 4%, the F1-Score is increased 3.4%, the localization error is reduced by 22.8°, and the frame recall rate is increased 3%.
【 授权许可】
CC BY
© The Author(s) 2022
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202305065303548ZK.pdf | 1828KB | download | |
12982_2022_119_Article_IEq156.gif | 1KB | Image | download |
12982_2022_119_Article_IEq158.gif | 1KB | Image | download |
12982_2022_119_Article_IEq160.gif | 1KB | Image | download |
12982_2022_119_Article_IEq162.gif | 1KB | Image | download |
12982_2022_119_Article_IEq164.gif | 1KB | Image | download |
12888_2022_4451_Article_IEq2.gif | 1KB | Image | download |
MediaObjects/12888_2022_4451_MOESM1_ESM.docx | 28KB | Other | download |
12982_2022_119_Article_IEq169.gif | 1KB | Image | download |
12982_2022_119_Article_IEq171.gif | 1KB | Image | download |
12982_2022_119_Article_IEq173.gif | 1KB | Image | download |
12982_2022_119_Article_IEq176.gif | 1KB | Image | download |
12982_2022_119_Article_IEq181.gif | 1KB | Image | download |
12982_2022_119_Article_IEq182.gif | 1KB | Image | download |
MediaObjects/12982_2022_119_MOESM1_ESM.docx | 38KB | Other | download |
Fig. 4 | 3268KB | Image | download |
12902_2022_1244_Article_IEq8.gif | 1KB | Image | download |
MediaObjects/12974_2022_2641_MOESM1_ESM.docx | 1099KB | Other | download |
Fig. 3 | 1070KB | Image | download |
Fig. 1 | 657KB | Image | download |
Fig. 2 | 985KB | Image | download |
Fig. 2 | 642KB | Image | download |
Fig. 1 | 87KB | Image | download |
Fig. 2 | 50KB | Image | download |
Fig. 3 | 56KB | Image | download |
12902_2022_1244_Article_IEq17.gif | 1KB | Image | download |
Fig.3 | 855KB | Image | download |
12902_2022_1244_Article_IEq19.gif | 1KB | Image | download |
MediaObjects/12888_2022_4484_MOESM1_ESM.docx | 27KB | Other | download |
12902_2022_1244_Article_IEq21.gif | 1KB | Image | download |
12902_2022_1244_Article_IEq22.gif | 1KB | Image | download |
12902_2022_1244_Article_IEq23.gif | 1KB | Image | download |
12902_2022_1244_Article_IEq24.gif | 1KB | Image | download |
Fig. 4 | 1952KB | Image | download |
MediaObjects/12974_2022_2668_MOESM5_ESM.tif | 680KB | Other | download |
12902_2022_1244_Article_IEq27.gif | 1KB | Image | download |
12902_2022_1244_Article_IEq28.gif | 1KB | Image | download |
Fig. 5 | 499KB | Image | download |
12902_2022_1244_Article_IEq30.gif | 1KB | Image | download |
Fig. 1 | 187KB | Image | download |
MediaObjects/12974_2022_2668_MOESM6_ESM.tif | 1339KB | Other | download |
Fig. 6 | 530KB | Image | download |
Fig. 1 | 111KB | Image | download |
Fig. 2 | 331KB | Image | download |
Fig. 2 | 131KB | Image | download |
12936_2022_4386_Article_IEq84.gif | 1KB | Image | download |
MediaObjects/12888_2022_4441_MOESM1_ESM.xlsx | 49KB | Other | download |
MediaObjects/12888_2022_4431_MOESM1_ESM.xlsx | 14KB | Other | download |
MediaObjects/12888_2022_4441_MOESM2_ESM.xlsx | 36KB | Other | download |
MediaObjects/12888_2022_4441_MOESM3_ESM.docx | 30KB | Other | download |
Fig. 4 | 3038KB | Image | download |
Fig. 3 | 219KB | Image | download |
40644_2022_507_Article_IEq1.gif | 1KB | Image | download |
Fig. 1 | 288KB | Image | download |
Fig. 1 | 177KB | Image | download |
Fig. 1 | 163KB | Image | download |
Fig. 2 | 196KB | Image | download |
MediaObjects/12888_2022_4350_MOESM1_ESM.docx | 54KB | Other | download |
MediaObjects/12888_2022_4350_MOESM2_ESM.docx | 51KB | Other | download |
MediaObjects/13046_2020_1633_MOESM5_ESM.tif | 1424KB | Other | download |
Fig. 7 | 1742KB | Image | download |
13690_2022_1011_Article_IEq1.gif | 1KB | Image | download |
13690_2022_1011_Article_IEq2.gif | 1KB | Image | download |
13690_2022_1011_Article_IEq3.gif | 1KB | Image | download |
13690_2022_1011_Article_IEq4.gif | 1KB | Image | download |
MediaObjects/13690_2022_1011_MOESM1_ESM.xlsx | 313KB | Other | download |
MediaObjects/13046_2022_2544_MOESM6_ESM.tif | 3616KB | Other | download |
MediaObjects/12888_2022_4428_MOESM1_ESM.docx | 35KB | Other | download |
MediaObjects/13690_2022_1011_MOESM2_ESM.xlsx | 314KB | Other | download |
MediaObjects/13046_2020_1633_MOESM6_ESM.tif | 2817KB | Other | download |
Fig. 6 | 766KB | Image | download |
Fig. 5 | 2897KB | Image | download |
Fig. 1 | 813KB | Image | download |
Fig. 3 | 401KB | Image | download |
MediaObjects/42004_2022_780_MOESM2_ESM.pdf | 5013KB | download | |
12936_2022_4386_Article_IEq117.gif | 1KB | Image | download |
Fig. 4 | 472KB | Image | download |
MediaObjects/12902_2022_1174_MOESM1_ESM.docx | 24KB | Other | download |
Fig. 2 | 970KB | Image | download |
Fig. 6 | 663KB | Image | download |
Fig. 6 | 1500KB | Image | download |
MediaObjects/12974_2022_2667_MOESM1_ESM.eps | 816KB | Other | download |
MediaObjects/13049_2022_1052_MOESM1_ESM.docx | 15KB | Other | download |
MediaObjects/12954_2022_723_MOESM1_ESM.docx | 29KB | Other | download |
Fig. 2 | 233KB | Image | download |
Fig. 3 | 784KB | Image | download |
Fig. 4 | 5742KB | Image | download |
Fig. 7 | 201KB | Image | download |
12902_2022_1222_Article_IEq2.gif | 1KB | Image | download |
12936_2022_4386_Article_IEq132.gif | 1KB | Image | download |
Fig. 4 | 542KB | Image | download |
MediaObjects/12974_2022_2659_MOESM1_ESM.pdf | 3198KB | download | |
MediaObjects/13046_2022_2544_MOESM7_ESM.tif | 5380KB | Other | download |
Fig. 1 | 1644KB | Image | download |
MediaObjects/13046_2022_2577_MOESM1_ESM.pdf | 8331KB | download | |
Fig. 5 | 2105KB | Image | download |
Fig. 2 | 2860KB | Image | download |
Fig. 2 | 541KB | Image | download |
Fig. 1 | 286KB | Image | download |
Fig. 1 | 586KB | Image | download |
Fig. 1 | 253KB | Image | download |
12936_2022_4386_Article_IEq142.gif | 1KB | Image | download |
Fig. 6 | 4373KB | Image | download |
Fig. 2 | 247KB | Image | download |
Fig. 1 | 139KB | Image | download |
Fig. 3 | 176KB | Image | download |
Fig. 1 | 455KB | Image | download |
MediaObjects/12888_2022_4455_MOESM1_ESM.pdf | 112KB | download | |
Fig. 1 | 3487KB | Image | download |
MediaObjects/12888_2022_4455_MOESM2_ESM.pdf | 110KB | download | |
Fig. 6 | 368KB | Image | download |
Fig. 7 | 413KB | Image | download |
【 图 表 】
Fig. 7
Fig. 6
Fig. 1
Fig. 1
Fig. 3
Fig. 1
Fig. 2
Fig. 6
12936_2022_4386_Article_IEq142.gif
Fig. 1
Fig. 1
Fig. 1
Fig. 2
Fig. 2
Fig. 5
Fig. 1
Fig. 4
12936_2022_4386_Article_IEq132.gif
12902_2022_1222_Article_IEq2.gif
Fig. 7
Fig. 4
Fig. 3
Fig. 2
Fig. 6
Fig. 6
Fig. 2
Fig. 4
12936_2022_4386_Article_IEq117.gif
Fig. 3
Fig. 1
Fig. 5
Fig. 6
13690_2022_1011_Article_IEq4.gif
13690_2022_1011_Article_IEq3.gif
13690_2022_1011_Article_IEq2.gif
13690_2022_1011_Article_IEq1.gif
Fig. 7
Fig. 2
Fig. 1
Fig. 1
Fig. 1
40644_2022_507_Article_IEq1.gif
Fig. 3
Fig. 4
12936_2022_4386_Article_IEq84.gif
Fig. 2
Fig. 2
Fig. 1
Fig. 6
Fig. 1
12902_2022_1244_Article_IEq30.gif
Fig. 5
12902_2022_1244_Article_IEq28.gif
12902_2022_1244_Article_IEq27.gif
Fig. 4
12902_2022_1244_Article_IEq24.gif
12902_2022_1244_Article_IEq23.gif
12902_2022_1244_Article_IEq22.gif
12902_2022_1244_Article_IEq21.gif
12902_2022_1244_Article_IEq19.gif
Fig.3
12902_2022_1244_Article_IEq17.gif
Fig. 3
Fig. 2
Fig. 1
Fig. 2
Fig. 2
Fig. 1
Fig. 3
12902_2022_1244_Article_IEq8.gif
Fig. 4
12982_2022_119_Article_IEq182.gif
12982_2022_119_Article_IEq181.gif
12982_2022_119_Article_IEq176.gif
12982_2022_119_Article_IEq173.gif
12982_2022_119_Article_IEq171.gif
12982_2022_119_Article_IEq169.gif
12888_2022_4451_Article_IEq2.gif
12982_2022_119_Article_IEq164.gif
12982_2022_119_Article_IEq162.gif
12982_2022_119_Article_IEq160.gif
12982_2022_119_Article_IEq158.gif
12982_2022_119_Article_IEq156.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]