可扩展的随机梯度黎曼 Langevin 动态在非对角度量中

Bayesian神经网络推断通常使用随机梯度采样方法进行。为了获得最佳性能,这些方法应使用改进后验探索的Riemannian度量,通过考虑局部曲率,但现有的方法采用简单的对角度量以保持计算效率。这样会失去一些收益。我们提出了两种非对角度量,可以在随机采样器中使用,提高收敛和探索,但与对角度量相比,只有很小的计算开销。我们证明,对于复杂后验的神经网络,例如使用稀疏诱导先验,使用这些度量可以明显改善结果。对于其他一些选择,后验绰绰有余,即使使用简单的度量也足够。
Bayesian neural network inference is often carried out using stochastic
gradient sampling methods. For best performance the methods should use a
Riemannian metric that improves posterior exploration by accounting for the
local curvature, but the existing methods resort to simple diagonal metrics to
remain computationally efficient. This loses some of the gains. We propose two
non-diagonal metrics that can be used in stochastic samplers to improve
convergence and exploration but that have only a minor computational overhead
over diagonal metrics. We show that for neural networks with complex
posteriors, caused e.g. by use of sparsity-inducing priors, using these metrics
provides clear improvements. For some other choices the posterior is
sufficiently easy also for the simpler metrics.
论文链接:http://arxiv.org/pdf/2303.05101v1


Posted

in

by

Tags: