Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Tuesday, 26 November 2024

Difference Between Return Sequences and Return States for LSTMs in Keras

The Keras deep learning library provides an implementation of the Long Short-Term Memory, or LSTM, recurrent neural network.

As part of this implementation, the Keras API provides access to both return sequences and return state. The use and difference between these data can be confusing when designing sophisticated recurrent neural network models, such as the encoder-decoder model.

In this tutorial, you will discover the difference and result of return sequences and return states for LSTM layers in the Keras deep learning library.

After completing this tutorial, you will know:

That return sequences return the hidden state output for each input time step.
That return state returns the hidden state output and cell state for the last input time step.
That return sequences and return state can be used at the same time.

Kick-start your project with my new book Long Short-Term Memory Networks With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Understand the Difference Between Return Sequences and Return States for LSTMs in Keras
Photo by Adrian Curt Dannemann, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

Long Short-Term Memory
Return Sequences
Return States
Return States and Sequences

Long Short-Term Memory

The Long Short-Term Memory, or LSTM, is a recurrent neural network that is comprised of internal gates.

Unlike other recurrent neural networks, the network’s internal gates allow the model to be trained successfully using backpropagation through time, or BPTT, and avoid the vanishing gradients problem.

In the Keras deep learning library, LSTM layers can be created using the LSTM() class.

Creating a layer of LSTM memory units allows you to specify the number of memory units within the layer.

Each unit or cell within the layer has an internal cell state, often abbreviated as “c“, and outputs a hidden state, often abbreviated as “h“.

The Keras API allows you to access these data, which can be useful or even required when developing sophisticated recurrent neural network architectures, such as the encoder-decoder model.

For the rest of this tutorial, we will look at the API for access these data.

Return Sequences

Each LSTM cell will output one hidden state h for each input.

1
h = LSTM(X)

We can demonstrate this in Keras with a very small model with a single LSTM layer that itself contains a single LSTM cell.

In this example, we will have one input sample with 3 time steps and one feature observed at each time step:

1
2
3
t1 = 0.1
t2 = 0.2
t3 = 0.3

The complete example is listed below.

Note: all examples in this post use the Keras functional API.

1
2
3
4
5
6
7
8
9
10
11
12
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(1)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

Running the example outputs a single hidden state for the input sequence with 3 time steps.

Your specific output value will differ given the random initialization of the LSTM weights and cell state.

1
[[-0.0953151]]

It is possible to access the hidden state output for each input time step.

This can be done by setting the return_sequences attribute to True when defining the LSTM layer, as follows:

1
LSTM(1, return_sequences=True)

We can update the previous example with this change.

The full code listing is provided below.

1
2
3
4
5
6
7
8
9
10
11
12
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = LSTM(1, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

Running the example returns a sequence of 3 values, one hidden state output for each input time step for the single LSTM cell in the layer.

1
2
3
[[[-0.02243521]
  [-0.06210149]
  [-0.11457888]]]

You must set return_sequences=True when stacking LSTM layers so that the second LSTM layer has a three-dimensional sequence input. For more details, see the post:

Stacked Long Short-Term Memory Networks

You may also need to access the sequence of hidden state outputs when predicting a sequence of outputs with a Dense output layer wrapped in a TimeDistributed layer. See this post for more details:

How to Use the TimeDistributed Layer for Long Short-Term Memory Networks in Python

Return States

The output of an LSTM cell or layer of cells is called the hidden state.

This is confusing, because each LSTM cell retains an internal state that is not output, called the cell state, or c.

Generally, we do not need to access the cell state unless we are developing sophisticated models where subsequent layers may need to have their cell state initialized with the final cell state of another layer, such as in an encoder-decoder model.

Keras provides the return_state argument to the LSTM layer that will provide access to the hidden state output (state_h) and the cell state (state_c). For example:

1
lstm1, state_h, state_c = LSTM(1, return_state=True)

This may look confusing because both lstm1 and state_h refer to the same hidden state output. The reason for these two tensors being separate will become clear in the next section.

We can demonstrate access to the hidden and cell states of the cells in the LSTM layer with a worked example listed below.

1
2
3
4
5
6
7
8
9
10
11
12
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

Running the example returns 3 arrays:

The LSTM hidden state output for the last time step.
The LSTM hidden state output for the last time step (again).
The LSTM cell state for the last time step.

1
2
3
[array([[ 0.10951342]], dtype=float32),
 array([[ 0.10951342]], dtype=float32),
 array([[ 0.24143776]], dtype=float32)]

The hidden state and the cell state could in turn be used to initialize the states of another LSTM layer with the same number of cells.

Return States and Sequences

We can access both the sequence of hidden state and the cell states at the same time.

This can be done by configuring the LSTM layer to both return sequences and return states.

1
lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)

The complete example is listed below.

1
2
3
4
5
6
7
8
9
10
11
12
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
# define model
inputs1 = Input(shape=(3, 1))
lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
print(model.predict(data))

Running the example, we can see now why the LSTM output tensor and hidden state output tensor are declared separably.

The layer returns the hidden state for each input time step, then separately, the hidden state output for the last time step and the cell state for the last input time step.

This can be confirmed by seeing that the last value in the returned sequences (first array) matches the value in the hidden state (second array).

1
2
3
4
5
[array([[[-0.02145359],
        [-0.0540871 ],
        [-0.09228823]]], dtype=float32),
 array([[-0.09228823]], dtype=float32),
 array([[-0.19803026]], dtype=float32)]

Summary

In this tutorial, you discovered the difference and result of return sequences and return states for LSTM layers in the Keras deep learning library.

Specifically, you learned:

That return sequences return the hidden state output for each input time step.
That return state returns the hidden state output and cell state for the last input time step.
That return sequences and return state can be used at the same time.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Artificial Intelligence , Machine Learning and Data Science Hubspot

Tuesday, 26 November 2024

Difference Between Return Sequences and Return States for LSTMs in Keras

Tutorial Overview

Long Short-Term Memory

Return Sequences

Return States

Return States and Sequences

Further Reading

Summary

No comments:

Post a Comment

AI:Male vs female four season

Report Abuse

Labels

"Donate for a Noble Cause