| Electronic Transactions on Numerical Analysis | |
| On the regularization effect of stochastic gradient descent applied to least-squares | |
| article | |
| Stefan Steinerberger1  | |
| [1] Department of Mathematics, University of Washington | |
| 关键词: stochastic gradient descent; Kaczmarz method; least-squares; regularization; | |
| DOI : 10.1553/etna_vol54s610 | |
| 学科分类:数学(综合) | |
| 来源: Kent State University * Institute of Computational Mathematics | |
PDF
|
|
【 摘 要 】
We study the behavior of the stochastic gradient descent method applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible matrices $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E}\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality since the last term involves one additional matrix multiplication applied to the error $x_k - x$ compared to the remaining terms: if the projection of $x_k - x$ onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.
【 授权许可】
Unknown
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202307010000574ZK.pdf | 804KB |
PDF