期刊论文

【摘要】

Abstract Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The first one is the training of two different neural networks, one for speech detection and another for music detection. The second approach consists on training a single neural network to tackle both tasks at the same time. The studied architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks. Comparative results are provided in terms of classification performance and model complexity. We would like to highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore, a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most harmful for the performance of the models, showing some difficult scenarios for the detection of music and speech.

【授权许可】

Unknown

EURASIP Journal on Audio, Speech, and Music Processing
Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Diego de Benito-Gorron¹ Joaquin Gonzalez-Rodriguez¹ Alicia Lozano-Diez¹ Doroteo T. Toledano¹
[1] AUDIAS (Audio, Data Intelligence and Speech) - Universidad Autonoma de Madrid;
关键词: Acoustic event detection; Speech activity detection; Music activity detection; Neural networks; Convolutional networks; LSTM;
DOI : 10.1186/s13636-019-0152-1
来源: DOAJ


	文献评价指标
	下载次数：0次	浏览次数：10次

【 摘 要 】

【 授权许可】

【摘要】

【授权许可】