BPMF

Title: Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo
Published: 2008
Data Set: Netflix Prize

Posted Jun 19, 2024

By jayarnim

4 min read

Previous Research

MF

행렬분해(Matrix Factorization; MF) : 잠재요인 모형의 원형으로서, 사용자-아이템 상호작용 행렬 $\mathbf{R} \in \mathbb{R}^{M \times N}$ 을 행렬분해하여 $K$ 차원 잠재요인(Latent Factor) 공간에 사용자-잠재요인 $\mathbf{U} \in \mathbb{R}^{M \times K}$ 와 아이템-잠재요인 $\mathbf{V} \in \mathbb{R}^{N \times K}$ 을 공동으로 사상하는 방법
Model
\[\begin{aligned} \mathbf{R} &\approx \mathbf{U}\mathbf{V}^{T} \end{aligned}\]
- $\mathbf{R}$ : User-Item Interaction Matrix
Hyper-Params
- $K$ : Dimension of Latent Factor
Objective Function Optimization
\[\begin{aligned} \hat{\mathbf{U}}, \hat{\mathbf{V}} &= \text{arg} \min_{\mathbf{U},\mathbf{V}}{\mathcal{J}\left(\mathbf{U}\mathbf{V}^{T}, \mathbf{R}\right)} \end{aligned}\]
- $\hat{\mathbf{U}}, \hat{\mathbf{V}}$ : OLS(Ordinary Least Squares) or MLE(Maximum Likelihood Estimation) Estimatior
  - $\mathbf{U}$ : User-Latent Factor Matrix
  - $\mathbf{V}$ : Item-Latent Factor Matrix

PMF

확률적 행렬분해(Probabilistic Matrix Factorization; PMF) : 정보 불확실성을 반영하기 위해 $\mathbf{U},\mathbf{V}$ 를 상수에서 확률변수화하고, 확률적 경사하강법(Stochastic Gradient Descent; SGD)을 활용하여 최대 사후 확률(Maximum a Posteriori; MAP) 추정치를 탐색하는 방법
Model
\[\mathbf{R} \mid \mathbf{U},\mathbf{V} \sim \mathcal{N}\left(\mathbf{U}\mathbf{V}^{T}, \sigma^{2}\right)\]
- $\mathbf{R} \mid \mathbf{U},\mathbf{V}$ : Liklihood Distribution of User-Item Interaction Matrix
Hyper-Params
- $K$ : Dimension of Latent Factor
- Prior Distribution Parameters of $\mathbf{U} \sim \mathcal{N}\left(\overrightarrow{\mu}_{U}, \lambda_{U}^{-1}\cdot\mathbf{I}\right)$
  - $\overrightarrow{\mu}_{U}$ : Mean of User-Latent Factor
  - $\lambda_{U}^{-1}$ : Variance of User-Latent Factor
- Prior Distribution Parameters of $\mathbf{V} \sim \mathcal{N}\left(\overrightarrow{\mu}_{V}, \lambda_{V}^{-1}\cdot\mathbf{I}\right)$
  - $\overrightarrow{\mu}_{V}$ : Mean of Item-Latent Factor
  - $\lambda_{V}^{-1}$ : Variance of Item-Latent Factor
- $\sigma^{2}$ : Noise of User-Item Interaction
Objective Function Optimization
\[\begin{aligned} \hat{\mathbf{U}}, \hat{\mathbf{V}} &= \text{arg} \max_{\mathbf{U},\mathbf{V}}{\log{P\left(\mathbf{U},\mathbf{V}\mid\mathbf{R}\right)}} \end{aligned}\]
- $\hat{\mathbf{U}}, \hat{\mathbf{V}}$ : MAP(Maximum a Posteriori) Estimator
  - $\mathbf{U}$ : User-Latent Factor Matrix
  - $\mathbf{V}$ : Item-Latent Factor Matrix
- $\log{P\left(\mathbf{U},\mathbf{V}\mid\mathbf{R}\right)}$ : Log Joint Posterior Probability of Params
  \[\log{P\left(\mathbf{U},\mathbf{V}\mid\mathbf{R}\right)} \propto \log{P\left(\mathbf{R}\mid\mathbf{U},\mathbf{V}\right)} + \log{P\left(\mathbf{U}\right)} + \log{P\left(\mathbf{V}\right)}\]

BPMF

베이지안 확률적 행렬분해(Bayesian Probabilistic Matrix Factorization; BPMF) : PMF 에서 제한적으로 적용되었던 베이지안 방법론의 적용 범위를 확장한 방법
PMF 의 한계점
- 학습파라미터의 사후 확률 분포를 추론한 후 베이즈 액션을 취하지 아니하고, 확률적 경사하강법을 통해 최대 사후 확률 추정치만을 추론함
- 학습파라미터를 최대 사후 확률 추정치로 확정함으로써 학습파라미터의 불확실성을 반영하지 아니함
BPMF 의 해법
- MCMC(Markov Chain Monte Carlo)를 활용하여 학습파라미터의 사후 확률 분포를 추론함
- 학습파라미터를 특정 값으로 확정하지 아니하고 사후 확률 분포로부터 샘플링된 다양한 추정치들을 활용함으로써 학습파라미터의 불확실성을 반영함
- $\mathbf{U}, \mathbf{V}$ 의 사전 확률 분포를 연구자에 의해 고정된 파라미터로 설정하지 아니하고 확률적 모델링을 통해 추정함

How to Modeling

Model
\[\mathbf{R} \mid \overrightarrow{\mu}_{U}, \Lambda_{U}, \overrightarrow{\mu}_{V}, \Lambda_{V} \sim \mathcal{N}\left(\mathbf{U}\mathbf{V}^{T}, \sigma^{2}\right)\]
- $\mathbf{R} \mid \mathbf{U},\mathbf{V}$ : Liklihood Distribution of User-Item Interaction Matrix
Hyper-Params
- $K$ : Dimension of Latent Factor
- Prior Distribution Parameters of $\overrightarrow{\mu} ; \Lambda \sim \mathcal{N}\left(\mu_{0}, (\beta_{0} \cdot \Lambda)^{-1}\right)$
  - $\mu_{0}=0$ : Mean of User(Item)-Latent Factor Mean
  - $\beta_{0}=0$ : Scale Factor of $\Lambda$
- Prior Distribution Parameters of $\Lambda \sim \mathcal{W}\left(\mathbf{W}_{0}, \nu_{0}\right)$
  - $\mathbf{W}_{0}=\mathbf{I}_{K \times K}$ : Scale Matrix
  - $\nu_{0}=K+1$ : Dgree of Freedom
- $\sigma^{2}$ : Noise of User-Item Interaction
Objective Function
\[\begin{aligned} &\log{P\left(\overrightarrow{\mu}_{U}, \Lambda_{U}, \overrightarrow{\mu}_{V}, \Lambda_{V} \mid \mathbf{R}\right)}\\ &\propto \log{P\left(\mathbf{R} \mid \mathbf{U},\mathbf{V}\right)}\\ &+ \log{P\left(\mathbf{U} \mid \overrightarrow{\mu}_{U},\Lambda_{U}\right)} + \log{P\left(\overrightarrow{\mu}_{U} ; \Lambda_{U}\right)} + \log{P\left(\Lambda_{U}\right)}\\ &+ \log{P\left(\mathbf{V} \mid \overrightarrow{\mu}_{V},\Lambda_{V}\right)} + \log{P\left(\overrightarrow{\mu}_{V} ; \Lambda_{V}\right)} + \log{P\left(\Lambda_{V}\right)} \end{aligned}\]
- $\overrightarrow{\mu}_{U}, \Lambda_{U}, \overrightarrow{\mu}_{V}, \Lambda_{V} \mid \mathbf{R}$ : Joint Posterior Probability Distribution of Params
- $\mathbf{R} \mid \mathbf{U},\mathbf{V}$ : Liklihood Distribution of User-Item Interaction Matrix
- $\mathbf{U} \mid \overrightarrow{\mu}_{U},\Lambda_{U}$ : Conditional Prior Distribution of User-Latent Factor Matrix
- $\mathbf{V} \mid \overrightarrow{\mu}_{V},\Lambda_{V}$ : Conditional Prior Distribution of Item-Latent Factor Matrix
- $\overrightarrow{\mu}, \Lambda$ : Marginal Prior Distribution of $\mathbf{U}, \mathbf{V} \sim \mathcal{N}\left(\overrightarrow{\mu}, \Lambda\right)$ Params

AI Application, Recommender System

Paper Review AI Application Recommender System Collaborative Filtering Latent Factor Model Bayesian

This post is licensed under CC BY 4.0 by the author.

Previous Research

MF

PMF

BPMF

How to Modeling

Trending Tags