期刊论文

【摘要】

To reduce neural network parameter counts and improve sound event detection performance, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. Our goal is to improve sound event detection performance and recognize target sound events with variable duration and different audio backgrounds with low parameter counts. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift-invariant features from the time and frequency domains of acoustic samples. A two-layer bidirectional gated recurrent unit is used to capture the temporal context from the extracted high-level features. The proposed method is evaluated on two different sound event datasets. Compared to that of the baseline method and other methods, the performance is greatly improved as a single model with low parameter counts without pretraining. On the TUT Rare Sound Events 2017 evaluation dataset, our method achieved an error rate (ER) of 0.09±0.01, which was an improvement of 83% compared with the baseline. On the TAU Spatial Sound Events 2019 evaluation dataset, our system achieved an ER of 0.11±0.01, a relative improvement over the baseline of 61%, and F1 and ER values that are better than those of the development dataset. Compared to the state-of-the-art methods, our proposed network achieves competitive detection performance with only one-fifth of the network parameter counts.

【授权许可】

Unknown

IEEE Access
MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Liang He¹ Keming Zhang² Yuan Ren² Ruida Ye² Yuanwen Cai²
[1]Department of Electronic Engineering, Tsinghua University, Beijing, China
[2]Space Engineering University, Beijing, China
关键词: Pattern recognition; sound event detection; multiscale learning; time-frequency transform; convolutional recurrent neural network;
DOI : 10.1109/ACCESS.2020.3015047
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：1次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】