期刊论文详细信息
IEEE Access
MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection
Liang He1  Keming Zhang2  Yuan Ren2  Ruida Ye2  Yuanwen Cai2 
[1]Department of Electronic Engineering, Tsinghua University, Beijing, China
[2]Space Engineering University, Beijing, China
关键词: Pattern recognition;    sound event detection;    multiscale learning;    time-frequency transform;    convolutional recurrent neural network;   
DOI  :  10.1109/ACCESS.2020.3015047
来源: DOAJ
【 摘 要 】
To reduce neural network parameter counts and improve sound event detection performance, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. Our goal is to improve sound event detection performance and recognize target sound events with variable duration and different audio backgrounds with low parameter counts. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift-invariant features from the time and frequency domains of acoustic samples. A two-layer bidirectional gated recurrent unit is used to capture the temporal context from the extracted high-level features. The proposed method is evaluated on two different sound event datasets. Compared to that of the baseline method and other methods, the performance is greatly improved as a single model with low parameter counts without pretraining. On the TUT Rare Sound Events 2017 evaluation dataset, our method achieved an error rate (ER) of 0.09±0.01, which was an improvement of 83% compared with the baseline. On the TAU Spatial Sound Events 2019 evaluation dataset, our system achieved an ER of 0.11±0.01, a relative improvement over the baseline of 61%, and F1 and ER values that are better than those of the development dataset. Compared to the state-of-the-art methods, our proposed network achieves competitive detection performance with only one-fifth of the network parameter counts.
【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次