Post

Recurrent Neural Networks

Based on the following lectures
(1) “Intro. to Deep Learning (2023-2)” by Prof. Seong Man An, Dept. of Data Science, The Grad. School, Kookmin Univ.
(2) “Text Analytics (2024-1)” by Prof. Je Hyuk Lee, Dept. of Data Science, The Grad. School, Kookmin Univ.

Why? Recurrent-Net


  • Time series data is data
    where there is a sequence between features:

    01

  • Fully connected layers treat the positions of input features equally,
    so they do not structurally reflect order information between features:

    02

  • RNN(Recurrent Neural Networks) involves preprocessing operations
    that preserve sequence information:

    03

Vanilla RNN


04

  • update hidden state $\overrightarrow{\mathbf{z}}_{t}$:

    \[\begin{aligned} \overrightarrow{\mathbf{z}}_{t} &= \text{tanh}(\mathbf{U}\cdot\overrightarrow{\mathbf{x}}_{t}+\mathbf{W}\cdot\overrightarrow{\mathbf{z}}_{t-1}+\overrightarrow{\mathbf{b}}_{h}) \end{aligned}\]
    • $\text{tanh}$ : activation function
    • $\overrightarrow{\mathbf{x}}_{t}$ : input value @ $t$
    • $\mathbf{U}$ : weight matrix of input value @ $t$
    • $\overrightarrow{\mathbf{z}}_{t-1}$ : hidden state @ $t-1$
    • $\mathbf{W}$ : weight matrix of hidden state @ $t-1$
    • $\overrightarrow{\mathbf{b}}_{h}$ : bias
  • print output $\overrightarrow{\mathbf{y}}_{t}$

    \[\begin{aligned} \overrightarrow{\mathbf{y}}_{t} &= \text{softmax}(\mathbf{V}\cdot\overrightarrow{\mathbf{z}}_{t}+\overrightarrow{\mathbf{b}}_{o}) \end{aligned}\]
    • $\text{softmax}$ : activation function
    • $\overrightarrow{\mathbf{z}}_{t}$ : hidden state @ $t$
    • $\mathbf{V}$ : weight matrix of hidden state @ $t$
    • $\overrightarrow{\mathbf{b}}_{o}$ : bias

LSTM


  • vanilla rnn suffers from the problems of long-term dependencies:

    05

    • Long-term dependencies are problems in which the initial order information is not preserved as the sequence gets longer due to the vanishing gradient.
  • LSTM(Long Short-Term Memory) is technique to alleviate vanishing gradient through gate adjustment:

    07

    • forget gate: generate forget rule
    • input gate: generate remember rule and cell state update
    • cell state: determine how much to remember and how much to forget
    • output gate: generate hidden state and output
  • forget gate:

    08

  • input gate:

    09

  • cell state:

    10

  • output gate:

    11


Sourse

  • https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr
This post is licensed under CC BY 4.0 by the author.