学位论文详细信息
Automatic classification of electronic music and speech/music audio content
speech/music discrimination;genre classification;Music information retrieval;Gaussian mixture model;Audio content analysis;Audio classification
Chen, Austin ; Hasegawa-Johnson ; Mark A.
关键词: speech/music discrimination;    genre classification;    Music information retrieval;    Gaussian mixture model;    Audio content analysis;    Audio classification;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/49569/Austin_Chen.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Automatic audio categorization has great potential for application in the maintenance and usage of large and constantly growing media databases; accordingly, much research has been done to demonstrate the feasibility of such methods. A popular topic is that of automatic genre classification, accomplished by training machine learning algorithms. However, “electronic” or “techno” music is often misrepresented in prior work, especially given the recent rapid evolution of the genre and subsequent splintering into distinctive subgenres. As such, features are extracted from electronic music samples in an experiment to categorize song samples into three subgenres: deep house, dubstep, and progressive house. An overall classification performance of 80.67% accuracy is achieved, comparable to prior work.Similarly, many past studies have been conducted on speech/music discrimination due to the potential applications for broadcast and other media, but it remains possible to expand the experimental scope to include samples of speech with varying amounts of background music. The development and evaluation of two measures of the ratio between speech energy and music energy are explored: a reference measure called speech-to-music ratio (SMR) and a feature which is an imprecise estimate of SMR called estimated voice-to-music ratio (eVMR). eVMR is an objective signal measure computed by taking advantage of broadcast mixing techniques in which vocals, unlike most instruments, are typically placed at stereo center. Conversely, SMR is a hidden variable defined by the relationship between the powers of portions of audio attributed to speech and music. It is shown that eVMR is predictive of SMR and can be combined with state-of-the-art features in order to improve performance. For evaluation, this new metric is applied in speech/music (binary) classification, speech/music/mixed (trinary) classification, and a new speech-to-music ratio estimation problem. Promising results are achieved, including 93.06% accuracy for trinary classification and 3.86 dB RMSE estimation of the SMR.

【 预 览 】
附件列表
Files Size Format View
Automatic classification of electronic music and speech/music audio content 648KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:15次