EM Algorithm
E-M Algorithm
-
E-M Algorithm(
Expectation-Maximization): 잠재요인(latent factor)을 수반하는 확률 모형에서, 우도(likelihood)를 직접 최대하기 어려울 경우 그 대리 변수로서 완전 데이터 우도(complete data likelihood)를 최대화하는 파라미터를 반복적으로 갱신함으로써 최우추정을 우회로 수행하는 알고리즘 -
model assumption:
\[\begin{aligned} \log{p(y,z\mid\theta)} &=\log{p(y\mid z,\theta)}+\log{p(z\mid\theta)} \end{aligned}\] -
likelihood:
\[\begin{aligned} \log{p(y\mid\theta)} &=\log{\sum_{z}{p(y,z\mid\theta)}} \end{aligned}\] -
The model assumes that $Y$ is generated from latent factors $Z$. However, only $Y$ is actually observable data, and $Z$ cannot be observed. This incomplete data likelihood takes the form of an integration or summation of latent factors, making it difficult to calculate in practice.
\[\begin{aligned} \max_{\theta}{\log{p(y\mid\theta)}} &=\max_{\theta}{\log{\sum_{z}{p(y,z\mid\theta)}}} \end{aligned}\] -
EM algorithm searches for parameters that maximize the expected value of the complete data log-likelihood as a proxy for the log-likelihood. The complete data likelihood represents the joint likelihood $p(y,z\mid\theta)$ assuming that the latent variable $Z$ is observed.
\[\max_{\theta}{\log{p(y,z\mid\theta)}}\]
Step
-
expectation step is step for estimating the posterior distribution of latent factors calculated based on the parameters estimated in the previous step.
\[\begin{aligned} q(z) &=p(z\mid y,\theta^{(k)})\\ &=\frac{p(y,z\mid\theta^{(k)})}{p(y\mid\theta^{(k)})}\\ &=\frac{p(y\mid z,\theta^{(k)})p(z\mid\theta^{(k)})}{\sum_{z^{\prime}}{p(y\mid z^{\prime},\theta^{(k)})p(z^{\prime}\mid\theta^{(k)})}} \end{aligned}\] -
maximization step is step that re-examines the parameters $\theta^{(k+1)}$ that maximize the expected joint log-likelihood $p(y,z\mid\theta)$ by utilizing the posterior distribution of the latent factor $Z$, estimated based on the parameters $\theta^{(k)}$ calculated in the previous step.
\[\begin{aligned} \theta^{(k+1)} &=\mathrm{arg}\max_{\theta}{Q(\theta\mid\theta^{(k)})} \end{aligned}\] -
Q function is the objective function of the EM algorithm, which is the expected complete-data log-likelihood, defined by utilizing the posterior distribution of the latent factor $Z$ estimated based on the parameter $\theta^{(k)}$ calculated in the previous step.
\[\begin{aligned} Q(\theta\mid\theta^{(k)}) &=\underbrace{\mathbb{E}_{q(z)}\left[\log{p(y,z\mid\theta)}\right]}_{\text{expected complete-data likelihood}}\\ &=\underbrace{\mathbb{E}_{q(z)}\left[\log{p(y\mid z,\theta)}\right]}_{\text{expected log likelihood}}+\underbrace{\mathbb{E}_{q(z)}\left[\log{p(z\mid\theta)}\right]}_{\text{expected log prior}} \end{aligned}\]
Jensen Ulighed
-
젠센 부등식(Jensen Ulighed):
\[\begin{aligned} f(\mathbb{E}[X])\ge \mathbb{E}[f(X)] \end{aligned}\]오목함수의 경우, 정의역의 평균에 대한 함수값이 최대값이 된다. 따라서 입력값의 평균에 대한 함수값은 개별 입력값에 대한 함수값들의 평균보다 항상 크거나 같다.
- definition:
- likelihood: $p(y\mid\theta)=\sum_{z}{p(y,z\mid\theta)}$
- posterior of latent factor: $q(z)=p(z\mid y,\theta^{(k)})$
- concave function: $f(x)=\log{x}$
-
likelihood transformation:
\[\begin{aligned} \log{p(y\mid\theta)} &=\log{\sum_{z}{p(y,z\mid\theta)}}\\ &=\log{\sum_{z}{q(z)\cdot\frac{p(y,z\mid\theta)}{q(z)}}}\\ &=\log{\mathbb{E}_{q(z)}\left[\frac{p(y,z\mid\theta)}{q(z)}\right]} \end{aligned}\] -
$\because$ jensen ulighed:
\[\begin{aligned} \log{\mathbb{E}_{q(z)}\left[\frac{p(y,z\mid\theta)}{q(z)}\right]} &\ge\mathbb{E}_{q(z)}\left[\log{\frac{p(y,z\mid\theta)}{q(z)}}\right] \end{aligned}\] -
right side:
\[\begin{aligned} \mathbb{E}_{q(z)}\left[\log{\frac{p(y,z\mid\theta)}{q(z)}}\right] &=\sum_{z}{q(z)\cdot\log{\frac{p(y,z\mid\theta)}{q(z)}}}\\ &=\sum_{z}{q(z)\cdot\left[\log{p(y,z\mid\theta)}-\log{q(z)}\right]}\\ &=\sum_{z}{q(z)\cdot\log{p(y,z\mid\theta)}}+\underbrace{\left(-\sum_{z}{q(z)\cdot\log{q(z)}}\right)}_{\text{entropy is const.}}\\ &=\underbrace{\mathbb{E}_{q(z)}\left[\log{p(y,z\mid\theta)}\right]}_{=:Q(\theta\mid\theta^{(k)})}+\underbrace{H(q)}_{\ge 0} \end{aligned}\] -
therefore, the parameter that maximizes the Q function becomes the parameter that maximizes the lower bound of the likelihood.
\[\begin{aligned} \log{p(y\mid\theta)}\ge Q(\theta\mid\theta^{(k)})+H(q) \end{aligned}\]