Stein Variational Gradient Descent

Liu, Q., & Wang, D.
(2016).
Stein variational gradient descent: A general purpose bayesian inference algorithm.
Advances in neural information processing systems, 29.

Posted Aug 3, 2024

By jayarnim

2 min read

SVGD

SVGD(Stein Variational Gradient Descent) : Stein Discrepancy 를 활용하여 제안 분포 $W \sim Q$ 를 목표 분포 $W \mid \mathcal{D} \sim P$ 에 근사하는 비모수 방법론(Non-Parametric Method)

Vector Optimization vs. Distribution Optimization through Gradient Descent

	Euclidean Gradient	Wasserstein Gradient
Space	Euclidean	Measure
Distance	Euclidean	Wasserstein
Target	$x$	$q(x)$
Loss Function	$\mathcal{L}(x)$	$D(q \parallel p)$
Gradient	$\partial \mathcal{L}(x) / \partial x$	$\delta D(q \parallel p) / \delta q$
Update	$x \to x^{*}$	$q \to p$

Distribution Optimization $\to$ Particle Optimization

Stein Variational Gradient Descent 는 파라미터의 분포를 직접 모델링하지 않고 개별 입자들을 모델링함. 즉, 제안 분포와 목표 분포 간 차이를 줄임으로써 분포 전체를 직접 최적화하는 방식이 아니라(변분 미분), 제안 분포에서 샘플링된 개별 입자들의 위치가 목표 분포를 따르도록 조정하는 방식으로 학습이 진행됨(편미분).
입자가 이동해야 하는 최적 방향을 제공하는 지표는 분포 간 차이를 나타내는 $D_{\text{STEIN}}(q \parallel p)$ 에 개별 입자의 위치를 편미분한 값 $\phi(x_{i})=\displaystyle\frac{\partial}{\partial x_{i}}D_{\text{STEIN}}(q \parallel p)$ 으로, 이는 RKHS에서 유도된 최적의 함수와 동일함.
Stein Gradient:
\[\begin{aligned} \phi(w_{i}) &= \mathbb{E}_{w_{j} \sim q}[\mathcal{K}(w_{i}, w_{j}) \cdot \nabla_{w_{j}}\log{p(w_{j} \mid \mathcal{D})}+\nabla_{w_{j}}\mathcal{K}(w_{i}, w_{j})]\\ &= \frac{1}{M}\sum_{j=1}^{M}{\underbrace{\mathcal{K}(w_{i}, w_{j})}_{\text{kernel}} \cdot \underbrace{\nabla_{w_{j}}\log{p(w_{j} \mid \mathcal{D})}}_{\text{score function}}+\underbrace{\nabla_{w_{j}}\mathcal{K}(w_{i}, w_{j})}_{\text{repulsion effect}}} \end{aligned}\]
- 스코어 함수(Score Function) : 입자 $w_{i} \sim q$ 가 확률 분포 $p$ 를 따라 움직이도록 유도함
  \[\begin{aligned} \underbrace{\log{p(w_{j} \mid \mathcal{D})}}_{\text{posterior}} &= \underbrace{\log{p(\mathcal{D} \mid w_{j})}}_{\text{likelihood}} + \underbrace{\log{p(w_{j})}}_{\text{prior}} - \underbrace{\log{p(\mathcal{D})}}_{\text{evidence}}\\ \therefore \underbrace{\nabla_{w_{j}}\log{p(w_{j} \mid \mathcal{D})}}_{\text{score function}} &= \underbrace{\nabla_{w_{j}}\log{p(\mathcal{D} \mid w_{j})}}_{\text{likelihood gradient}} + \underbrace{\nabla_{w_{j}}\log{p(w_{j})}}_{\text{prior gradient}} \end{aligned}\]
- 반발 효과(Repulsion Effect) : 입자들이 서로 퍼지도록 함으로써 입자 간 분포를 유지함
Update:
\[\begin{aligned} w_{i} \gets w_{i} + \epsilon \cdot \phi(w_{i}) \end{aligned}\]

BAYES, 2.posterior approx.

This post is licensed under CC BY 4.0 by the author.

SVGD

Trending Tags