学位论文详细信息
Audio compression via nonlinear transform coding and stochastic binary activation
Audio compression;Neural network;Convolutional neural network (CNN);Stochastic binary activation
Yan, Yuanheng ; Smaragdis ; Paris
关键词: Audio compression;    Neural network;    Convolutional neural network (CNN);    Stochastic binary activation;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/105709/YAN-THESIS-2019.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Engineers have pushed the boundaries of audio compression and designed numerous lossy audio compression codecs, such as ACC, WNA, and others, that have surpassed the longstanding MP3 coding format. However most of the methods are laboriously engineered using psychoacoustic modeling, and some of them are proprietary and only see limited use. This thesis, inspired by recent major breakthroughs in lossy image compression via machine learning methods, explores the possibilities of a neural network trained for lossy audio compression. Currently there are few if any audio compression methods that utilize machine learning.This thesis presents a brief introduction to lossy transform compression and compares it to similar machine learning concepts, then systematically presents a convolutional autoencoder network with a stochastic binary activation for a sparse representation of the code space to achieve compression. A similar network is employed for encoding the residual of the main network.Our network achieves average compression rates of roughly 5 to 2 and introduces few if any audible artifacts, presenting a promising opening to audio compression using machine learning.

【 预 览 】
附件列表
Files Size Format View
Audio compression via nonlinear transform coding and stochastic binary activation 10234KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:48次