学位论文详细信息
Algorithms and analysis for non-convex optimization problems in machine learning
Machine learning;Non-convex optimization;Spectral algorithms;Neural networks;Deep learning
Xie, Bo ; Song, Le Computational Science and Engineering Vempala, Santosh Zha, Hongyuan Boots, Byron Anandkumar, Animashree ; Song, Le
University:Georgia Institute of Technology
Department:Computational Science and Engineering
关键词: Machine learning;    Non-convex optimization;    Spectral algorithms;    Neural networks;    Deep learning;   
Others  :  https://smartech.gatech.edu/bitstream/1853/58642/1/XIE-DISSERTATION-2017.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

In this thesis, we propose efficient algorithms and provide theoretical analysis through the angle of spectral methods for some important non-convex optimization problems in machine learning. Specifically, we focus on two types of non-convex optimization problems: learning the parameters of latent variable models and learning in deep neural networks. Learning latent variable models is traditionally framed as a non-convex optimization problem through Maximum Likelihood Estimation (MLE). For some specific models such as multi-view model, we can bypass the non-convexity by leveraging the special model structure and convert the problem into spectral decomposition through Methods of Moments (MM) estimator.In this thesis, we propose a novel algorithm that can flexibly learn a multi-view model in a non-parametric fashion. To scale the nonparametric spectral methods to large datasets, we propose an algorithm called doubly stochastic gradient descent which uses sampling to approximate two expectations in the problem, and it achieves better balance of computation and statistics by adaptively growing the model as more data arrive. Learning with neural networks is a difficult non-convex problem while simple gradient-based methods achieve great success in practice. In this part of the thesis, we try to understand the optimization landscape of learning one-hidden-layer networks with Rectified Linear (ReLU) activation functions. By directly analyzing the structure of the gradient, we can show neural networks with diverse weights have no spurious local optima. This partly explains the empirical success of gradient descent since a stationary point leads to a global optimum under diversity conditions on the neural weights.

【 预 览 】
附件列表
Files Size Format View
Algorithms and analysis for non-convex optimization problems in machine learning 7187KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:18次