期刊论文

【摘要】

We study the behavior of the stochastic gradient descent method applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible matrices $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E}\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality since the last term involves one additional matrix multiplication applied to the error $x_k - x$ compared to the remaining terms: if the projection of $x_k - x$ onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO202307010000574ZK.pdf	804KB	PDF	download

Electronic Transactions on Numerical Analysis
On the regularization effect of stochastic gradient descent applied to least-squares
article
Stefan Steinerberger¹
[1] Department of Mathematics, University of Washington
关键词: stochastic gradient descent; Kaczmarz method; least-squares; regularization;
DOI : 10.1553/etna_vol54s610
学科分类：数学（综合）
来源: Kent State University * Institute of Computational Mathematics
PDF


	文献评价指标
	下载次数：14次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】