期刊论文详细信息
Electronic Transactions on Numerical Analysis
On the regularization effect of stochastic gradient descent applied to least-squares
article
Stefan Steinerberger1 
[1] Department of Mathematics, University of Washington
关键词: stochastic gradient descent;    Kaczmarz method;    least-squares;    regularization;   
DOI  :  10.1553/etna_vol54s610
学科分类:数学(综合)
来源: Kent State University * Institute of Computational Mathematics
PDF
【 摘 要 】

We study the behavior of the stochastic gradient descent method applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible matrices $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E}\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality since the last term involves one additional matrix multiplication applied to the error $x_k - x$ compared to the remaining terms: if the projection of $x_k - x$ onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307010000574ZK.pdf 804KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:0次