Recurrent Neural Networks
Based on the following lectures
(1) “Intro. to Deep Learning (2023-2)” by Prof. Seong Man An, Dept. of Data Science, The Grad. School, Kookmin Univ.
(2) “Text Analytics (2024-1)” by Prof. Je Hyuk Lee, Dept. of Data Science, The Grad. School, Kookmin Univ.
Why? Recurrent-Net
-
Time series data is data
where there is asequence between features
: -
Fully connected layers
treat the positions of input features equally,
so they do not structurally reflectorder information
between features: -
RNN(
R
ecurrentN
euralN
etworks) involvespreprocessing
operations
that preservesequence information
:
Vanilla RNN
-
update hidden state $\overrightarrow{\mathbf{z}}_{t}$:
\[\begin{aligned} \overrightarrow{\mathbf{z}}_{t} &= \text{tanh}(\mathbf{U}\cdot\overrightarrow{\mathbf{x}}_{t}+\mathbf{W}\cdot\overrightarrow{\mathbf{z}}_{t-1}+\overrightarrow{\mathbf{b}}_{h}) \end{aligned}\]- $\text{tanh}$ : activation function
- $\overrightarrow{\mathbf{x}}_{t}$ : input value @ $t$
- $\mathbf{U}$ : weight matrix of input value @ $t$
- $\overrightarrow{\mathbf{z}}_{t-1}$ : hidden state @ $t-1$
- $\mathbf{W}$ : weight matrix of hidden state @ $t-1$
- $\overrightarrow{\mathbf{b}}_{h}$ : bias
-
print output $\overrightarrow{\mathbf{y}}_{t}$
\[\begin{aligned} \overrightarrow{\mathbf{y}}_{t} &= \text{softmax}(\mathbf{V}\cdot\overrightarrow{\mathbf{z}}_{t}+\overrightarrow{\mathbf{b}}_{o}) \end{aligned}\]- $\text{softmax}$ : activation function
- $\overrightarrow{\mathbf{z}}_{t}$ : hidden state @ $t$
- $\mathbf{V}$ : weight matrix of hidden state @ $t$
- $\overrightarrow{\mathbf{b}}_{o}$ : bias
LSTM
-
vanilla rnn suffers from the problems of
long-term dependencies
:Long-term dependencies
are problems in which theinitial order information
is not preserved as the sequence gets longer due to thevanishing gradient
.
-
LSTM(
L
ongS
hort-T
ermM
emory) is technique to alleviate vanishing gradient throughgate adjustment
:forget gate
: generateforget rule
input gate
: generateremember rule
andcell state update
cell state
:determine
how much to remember and how much to forgetoutput gate
: generatehidden state
andoutput
-
forget gate:
-
input gate:
-
cell state:
-
output gate:
Sourse
- https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr