Hidden Markov Model
Markov Chain
-
The Markov Property is a property in which past information is concentrated in the current state, so that the future state is independent of the past state conditionally on the current state.
\[\begin{gathered} X(t+1)\perp X(0),X(1),\cdots,X(t-1)\mid X(t) \end{gathered}\] -
For a random variable sequence \(\{X(t):t\in T\}\) defined on a discrete time space \(t\in T\) and a discrete state space \(S=\{k\}_{k=1}^{K}\), if this sequence is a stochastic process that satisfies the Markov Property, it is defined as a Markov Chain.
\[\begin{gathered} P[X(t+1)\mid X(0),X(1),\cdots,X(t-1),X(t)] =P[X(t+1)\mid X(t)] \end{gathered}\] -
discrete state probability is the possibility that at time $t$ the random variable $X(t)$ will take a specific state $i$.
\[\pi_{i}(t)=P[X(t)=i]\] -
transition probability is the possibility that the future state $t+1$ will change to $j$ when the current state $t$ is realized as $i$.
\[\begin{aligned} \alpha_{i,j} =P[X(t+1)=j\mid X(t)=i] \end{aligned}\] -
discrete state distribution vector \(\Pi(t)\) is updated through multiplication with the transition probability matrix \(\mathbf{A}\).
\[\begin{aligned} \Pi(t+1) &=\Pi(t)\mathbf{A} \end{aligned}\] -
The possibility that a Markov Chain will maintain a specific state in the long run is called the Steady-State Probability.
\[\begin{aligned} \Pi(t)\xrightarrow{t\to\infty}\Pi, \quad\Pi=\Pi\mathbf{A} \end{aligned}\]
Hidden Markov Model
-
Let the observation $Y$ occur conditionally on the latent factor $X$, and let this latent factor $X$ be a Markov chain. This model is called a
\[\begin{gathered} P(X,Y)=P(Y\mid X)P(X) \quad\mathrm{for}\quad X(t)\perp X(1),\cdots,X(t-1)\mid X(t) \end{gathered}\]HiddenMarkovModel. -
Optimization of
\[\Theta^{(s+1)} =\mathrm{arg}\max_{\Theta}{Q(\Theta\mid\Theta^{(s)})}\]HiddenMarkovModel is performed using the EM Algorithm. -
The parameters are the initial state probability $\pi_{k}$, transition probability $\alpha_{i,j}(\forall t)$, and emission probability $\beta_{k}(t)$.
\[\Theta :=\left\{\pi_{k}\in\Pi,\alpha_{i,j}\in\mathbf{A},\beta_{k}(t)\in\mathbf{B}\right\}\] -
Specifically, the latent variables are (1) the probability that the hidden state $X(t)=k$ is present at each point and (2) the probability that the transition event occurring at each point is $i\to j$, given the obs $Y$.
\[\begin{aligned} \mu_{k}(t) &=P[X(t)=k\mid Y(1),\cdots,Y(T)]\\ \gamma_{i,j}(t) &=P[X(t+1)=j,X(t)=i\mid Y(1),\cdots,Y(T)] \end{aligned}\]
definition
-
Hidden State:
\[\begin{aligned} X(1),X(2),\cdots,X(T) \end{aligned}\] -
Observed State:
\[\begin{aligned} Y(1),Y(2),\cdots,Y(T) \end{aligned}\] -
Initial State:
\[\begin{aligned} \pi_{i}(1):=P[X(1)=i] \end{aligned}\] -
The transition probability represents the possibility that the hidden state will transition from $i\to j$.
\[\begin{aligned} \alpha_{i,j} :=P[X(t+1)=j\mid X(t)=i] \end{aligned}\] -
The emission probability represents the possibility that the observed state $Y(t)$ will be realized at point $t$ when the hidden state $X(t)=j$ is realized at that point.
\[\begin{aligned} \beta_{j}(t) :=P[Y(t)\mid X(t)=j] \end{aligned}\]
forward probability
-
The forward probability is expressed as a joint probability as the possibility that the observed state from the beginning to point $t$ and the hidden state at point $t$ will be realized in a specific form.
\[\begin{aligned} \phi_{j}(t) :=P[\underbrace{Y(1),\cdots,Y(t)}_{\text{obs}};\underbrace{X(t)=j}_{\text{latent factor}}] \end{aligned}\] -
initial expression ($t=1$):
\[\begin{aligned} \phi_{j}(1) &=P[Y(1);X(1)=j]\\ &=P[X(1)=j]\cdot P[Y(1)\mid X(1)=j]\\ &=\pi_{j}(1)\beta_{j}(1) \end{aligned}\] -
factorization:
\[\begin{aligned} \phi_{i}(t-1) &=P[Y(1),\cdots,Y(t-1);X(t-1)=i]\\ \alpha_{i,j} &=P[X(t)=j\mid X(t-1)=i]\\ \beta_{j}(t) &=P[Y(t)\mid X(t)=j] \end{aligned}\] -
recursion ($t\to t+1$):
\[\begin{aligned} \phi_{j}(t) &=P[Y(1),\cdots,Y(t);X(t)=j]\\ &=\sum_{i=1}^{K}{P[Y(1),\cdots,Y(t);X(t)=j,X(t-1)=i]}\\ &=\beta_{j}(t)\cdot \sum_{i=1}^{K}{\phi_{i}(t-1)\cdot\alpha_{i,j}} \end{aligned}\] -
final expression ($t=1,\cdots,T$):
\[\begin{aligned} P[Y(1),\cdots,Y(T)] &=\sum_{j=1}^{K}{P[Y(1),\cdots,Y(T);X(T)=j]}\\ &=\sum_{j=1}^{K}{\phi_{j}(T)} \end{aligned}\]
backward probability
-
The backward probability is expressed as a conditional probability as the possibility that the observed state will be realized in a specific form from point $t+1$ onwards when the hidden state at point $t$ is realized in a specific form.
\[\begin{aligned} \psi_{j}(t) :=P[\underbrace{Y(t+1),\cdots,Y(T)}_{\text{obs}}\mid\underbrace{X(t)=j}_{\text{latent factor}}] \end{aligned}\] -
final expression ($t=T$):
\[\begin{aligned} \psi_{j}(T) =P[X(T)=j] =1 \end{aligned}\] -
factorization:
\[\begin{aligned} \alpha_{i,j} &=P[X(t+1)=j\mid X(t)=i]\\ \beta_{j}(t+1) &=P[Y(t+1)\mid X(t+1)=j]\\ \psi_{j}(t+1) &=P[Y(t+2),\cdots,Y(T)\mid X(t+1)=j] \end{aligned}\] -
recursion ($t+1\to t$):
\[\begin{aligned} \psi_{i}(t) &=P[Y(t+1),\cdots,Y(T)\mid X(t)=i]\\ &=\sum_{j}{P[Y(t+1),\cdots,Y(T);X(t+1)=j\mid X(t)=i]}\\ &=\sum_{j}{\alpha_{i,j}\cdot\beta_{j}(t+1)\cdot\psi_{j}(t+1)} \end{aligned}\]
latent factor posterior
state posterior
-
1 point state posterior:
\[\begin{aligned} \mu_{j}(t) &=P[X(t)=j\mid Y(1),\cdots,Y(T)] \end{aligned}\] -
factorization:
\[\begin{aligned} \phi_{j}(t) &=P[Y(1),\cdots,Y(t);X(t)=j]\\ \psi_{j}(t) &=P[Y(t+1),\cdots,Y(T)\mid X(t)=j]\\ \sum_{k}{\phi_{k}(T)} &=P[Y(1),\cdots,Y(T)] \end{aligned}\] -
transformation:
\[\begin{aligned} \mu_{j}(t) &=P[X(t)=j\mid Y(1),\cdots,Y(T)]\\ &=\frac{P[Y(1),\cdots,Y(T);X(t)=j]}{P[Y(1),\cdots,Y(T)]}\\ &=\frac{\phi_{j}(t)\cdot\psi_{j}(t)}{\sum_{k}{\phi_{k}(T)}} \end{aligned}\]
transition event posterior
-
$i\to j$ transition event posterior:
\[\begin{aligned} \gamma_{i,j}(t) &=P[X(t+1)=j,X(t)=i\mid Y(1),\cdots,Y(T)] \end{aligned}\] -
factorization:
\[\begin{aligned} \phi_{i}(t) &=P[Y(1),\cdots,Y(t);X(t)=i]\\ \alpha_{i,j} &=P[X(t+1)=j\mid X(t)=i]\\ \beta_{j}(t+1) &=P[Y(t+1)\mid X(t+1)=j]\\ \psi_{j}(t+1) &=P[Y(t+2),\cdots,Y(t)]\\ \sum_{k}{\phi_{k}(T)} &=P[Y(1),\cdots,Y(T)] \end{aligned}\] -
transformation:
\[\begin{aligned} \gamma_{i,j}(t) &=P[X(t+1)=j,X(t)=i\mid Y(1),\cdots,Y(T)]\\ &=\frac{P[Y(1),\cdots,Y(T);X(t+1)=j,X(t)=i]}{P[Y(1),\cdots,Y(T)]}\\ &=\frac{\phi_{i}(t)\cdot\alpha_{i,j}\cdot\beta_{j}(t+1)\cdot\psi_{j}(t+1)}{\sum_{k}{\phi_{k}(T)}} \end{aligned}\]
E-M Algorithm
-
parameters:
\[\Theta :=\left\{\pi_{k}\in\Pi,\alpha_{i,j}\in\mathbf{A},\beta_{k}(t)\in\mathbf{B}\right\}\] -
complete-data expected log-likelihood:
\[\begin{aligned} \mathbb{E}_{q(X)}[\log{P(Y,X\mid\theta)}] &=\underbrace{\mathbb{E}_{q(X)}[\log{P(Y\mid X,\theta)}]}_{\text{log emission prob.}}+\underbrace{\mathbb{E}_{q(X)}[\log{P(X\mid\theta)}]}_{\text{log transition prob.}} \end{aligned}\] -
emission prob.
\[\begin{aligned} \mathbb{E}_{q(X)}[\log{P(Y\mid X,\theta)}] &=\underbrace{q(X)}_{\text{state prob.}}\cdot\underbrace{\log{P[Y(t)\mid X(t)=k]}}_{\text{log emission prob.}}\\ &=\mu_{k}(t)\cdot\log{\beta_{k}(t)} \end{aligned}\] -
transition prob.
\[\begin{aligned} \mathbb{E}_{q(X)}[\log{P(X\mid\theta)}] &=\underbrace{q(X)}_{\text{event prob.}}\cdot\underbrace{\log{P[X(t+1)=j\mid X(t)=i]}}_{\text{log transition prob.}}\\ &=\gamma_{i,j}(t)\cdot\log{\alpha_{i,j}(t)} \end{aligned}\] -
init prob.
\[\begin{aligned} \mathbb{E}_{q(X)}(\log{P[X(1)=k]}) &=\underbrace{q(X)}_{\text{state prob.}}\cdot\underbrace{\log{P[X(1)=k]}}_{\text{log init prob.}}\\ &=\mu_{k}(1)\cdot\log{\pi_{k}} \end{aligned}\] -
Q function:
\[\begin{aligned} Q(\Theta\mid\Theta^{(s)}) &=\sum_{t=1}^{T}\sum_{k=1}^{K}{\mu_{k}(t)\cdot\log{\beta_{k}(t)}}\\ &\quad +\sum_{t=1}^{T-1}\sum_{i=1}^{K}\sum_{j=1}^{K}{\gamma_{i,j}(t)\cdot\log{\alpha_{i,j}(t)}}+\sum_{k=1}^{K}{\mu_{k}(1)\cdot\log{\pi_{k}}} \end{aligned}\] -
MLE parameter update:
\[\Theta^{(s+1)} =\mathrm{arg}\max_{\Theta}{Q(\Theta\mid\Theta^{(s)})}\]