期刊论文详细信息
EURASIP Journal on Audio, Speech, and Music Processing
Frequency-dependent auto-pooling function for weakly supervised sound event detection
Yin Cao1  Jun Yang2  Sichen Liu2  Feiran Yang3 
[1] Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK;Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, No. 21 North 4th Ring Road, Beijing, China;University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Beijing, China;University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Beijing, China;State Key Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences, No. 21 North 4th Ring Road, Beijing, China;
关键词: Sound event detection;    Weakly supervised;    Auto-pooling function;    Depthwise separable convolution;   
DOI  :  10.1186/s13636-021-00206-7
来源: Springer
PDF
【 摘 要 】

Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotations for sound events at each frame. Many available sound event datasets only contain audio tags without precise temporal information. This type of dataset is therefore classified as weakly labeled dataset. In this paper, we propose a novel source separation-based method trained on weakly labeled data to solve SED problems. We build a dilated depthwise separable convolution block (DDC-block) to estimate time-frequency (T-F) masks of each sound event from a T-F representation of an audio clip. DDC-block is experimentally proven to be more effective and computationally lighter than “VGG-like” block. To fully utilize frequency characteristics of sound events, we then propose a frequency-dependent auto-pooling (FAP) function to obtain the clip-level present probability of each sound event class. A combination of two schemes, named DDC-FAP method, is evaluated on DCASE 2018 Task 2, DCASE 2020 Task4, and DCASE 2017 Task 4 datasets. The results show that DDC-FAP has a better performance than the state-of-the-art source separation-based method in SED task.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107078459087ZK.pdf 1529KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:2次