JOURNAL OF COMPUTATIONAL PHYSICS | 卷:432 |
An adaptive Hessian approximated stochastic gradient MCMC method | |
Article | |
Wang, Yating1  Deng, Wei1  Lin, Guang2  | |
[1] Purdue Univ, Dept Math, W Lafayette, IN 47907 USA | |
[2] Purdue Univ, Dept Math, Sch Mech Engn, Dept Stat Courtesy,Dept Earth Atmospher & Planeta, W Lafayette, IN 47907 USA | |
关键词: Adaptive Bayesian method; Deep learning; Hessian approximated stochastic gradient; MCMC; Stochastic approximation; Limited memory BFGS; Highly correlated density; | |
DOI : 10.1016/j.jcp.2021.110150 | |
来源: Elsevier | |
【 摘 要 】
Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. In this work, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy. (C) 2021 Elsevier Inc. All rights reserved.
【 授权许可】
Free
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
10_1016_j_jcp_2021_110150.pdf | 2148KB | download |