From b3683bff04d235c7b26003e6a5bad6e06f5cec24 Mon Sep 17 00:00:00 2001 From: Marco Kuhlmann <marco.kuhlmann@liu.se> Date: Sun, 21 Jan 2024 23:30:26 +0100 Subject: [PATCH] Add a notebook on LSTM usage patterns --- .../lstm-usage-patterns.ipynb | 459 ++++++++++++++++++ 1 file changed, 459 insertions(+) create mode 100644 sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb diff --git a/sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb b/sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb new file mode 100644 index 0000000..c714cbf --- /dev/null +++ b/sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb @@ -0,0 +1,459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "503fac0b", + "metadata": {}, + "source": [ + "# LSTM usage patterns" + ] + }, + { + "cell_type": "markdown", + "id": "ab4b84dd", + "metadata": {}, + "source": [ + "We have seen three usage patterns for recurrent neural networks: as an *encoder*, as a *transducer*, and as a *decoder*. In this notebook you will learn how to realise the *Encoder* and the *Transducer* patterns in PyTorch with an LSTM architecture. The *Decoder* pattern will be featured in Unit 3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f249a44c", + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "from torch import nn as nn" + ] + }, + { + "cell_type": "markdown", + "id": "fc0381b3", + "metadata": {}, + "source": [ + "## Sample input" + ] + }, + { + "cell_type": "markdown", + "id": "5834a7e9", + "metadata": {}, + "source": [ + "To illustrate the two patterns, we use an input batch `x` containing a single sequence with three elements, each of which is a vector of size five." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da5d63b5", + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(1, 3, 5)" + ] + }, + { + "cell_type": "markdown", + "id": "1c9ff0f7", + "metadata": {}, + "source": [ + "Here is how our concrete `x` looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "638a1350", + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "id": "ff3064a0", + "metadata": {}, + "source": [ + "## Model" + ] + }, + { + "cell_type": "markdown", + "id": "5034b5b2", + "metadata": {}, + "source": [ + "Next, we define the LSTM model. In PyTorch, the LSTM architecture is implemented by the class [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html).\n", + "\n", + "We use an LSTM with *input_size* of 5 and a *hidden_size* of 2. This LSTM will process the sequence of 5-dimensional vectors in `x` and map each input vector to an hidden state in the form of a 2-dimensional vector.\n", + "\n", + "Per default, an [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) expects its input to have the shape (*sequence_length*, *batch_size*, *input_size*). For our purposes, it is easier to instead take the input in the form (*batch_size*, *sequence_length*, *input_size*). To get this behaviour, we set the `batch_first` argument to `True`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a0d71bd", + "metadata": {}, + "outputs": [], + "source": [ + "model = nn.LSTM(5, 2, batch_first=True)" + ] + }, + { + "cell_type": "markdown", + "id": "9af73c8d", + "metadata": {}, + "source": [ + "## Output" + ] + }, + { + "cell_type": "markdown", + "id": "4100af6c", + "metadata": {}, + "source": [ + "We are now ready to feed the example input to our model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0f2b1260", + "metadata": {}, + "outputs": [], + "source": [ + "output, (h_n, c_n) = model.forward(x)" + ] + }, + { + "cell_type": "markdown", + "id": "d527b307", + "metadata": {}, + "source": [ + "The result of the `forward()` method has two components:\n", + "\n", + "The first component is a tensor `output` that holds the hidden states computed by the LSTM, for each position of the input sequence. Consequently, the shape of `output` is (*batch_size*, *sequence_length*, *hidden_size*)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "82e99af6", + "metadata": {}, + "outputs": [], + "source": [ + "output.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0f8d1312", + "metadata": {}, + "outputs": [], + "source": [ + "output" + ] + }, + { + "cell_type": "markdown", + "id": "d9319d5d", + "metadata": {}, + "source": [ + "The second component is a pair of tensors `h_n` and `c_n` which represent the final hidden state and cell state of the LSTM, respectively. These are the hidden state and cell state computed at the last position of the input sequence. Their common shape is (1, *batch_size*, *hidden_size*):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fd0fffb5", + "metadata": {}, + "outputs": [], + "source": [ + "h_n.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a60ce57", + "metadata": {}, + "outputs": [], + "source": [ + "c_n.shape" + ] + }, + { + "cell_type": "markdown", + "id": "2ff9e1ab", + "metadata": {}, + "source": [ + "We can verify that (the only element of) `h_n` is indeed identical to the last row of `output`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b923f2d4", + "metadata": {}, + "outputs": [], + "source": [ + "h_n[0]" + ] + }, + { + "cell_type": "markdown", + "id": "ba4224d0", + "metadata": {}, + "source": [ + "**🤔 Question 1: Batch size**\n", + "\n", + "> How do the concrete shapes of `output`, `h_n` and `c_n` change when you process a batch of seven sequences instead of just one?\n", + "\n", + "**🤔 Question 2: Stacked LSTMs**\n", + "\n", + "> The [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class supports stacked LSTMs with multiple layers. How do the shapes of `output`, `h_n` and `c_n` change when you define the model to have three layers? How can you then get the final state of the final layer?" + ] + }, + { + "cell_type": "markdown", + "id": "69d255b0", + "metadata": {}, + "source": [ + "## Encoder" + ] + }, + { + "cell_type": "markdown", + "id": "bef42573", + "metadata": {}, + "source": [ + "To realise the *Encoder* pattern, we simply return the final hidden state:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "979c6c3b", + "metadata": {}, + "outputs": [], + "source": [ + "def encode(model, x):\n", + " output, (h_n, c_n) = model.forward(x)\n", + " return h_n[-1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e7bb697", + "metadata": {}, + "outputs": [], + "source": [ + "y = encode(model, x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad2ce429", + "metadata": {}, + "outputs": [], + "source": [ + "y.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b418b972", + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "markdown", + "id": "bc970eec", + "metadata": {}, + "source": [ + "**🤔 Question 3: Bi-directional LSTMs**\n", + "\n", + "> In addition to stacking, the [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class also supports bi-directional networks. How do the shapes of `output`, `h_n` and `c_n` change in that case? How can you get the final states for the two uni-directional networks?" + ] + }, + { + "cell_type": "markdown", + "id": "1df1b2d0", + "metadata": {}, + "source": [ + "## Transducer" + ] + }, + { + "cell_type": "markdown", + "id": "7a6043b0", + "metadata": {}, + "source": [ + "To realise a *Transducer*, we return the complete output tensor `output`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c82a322c", + "metadata": {}, + "outputs": [], + "source": [ + "def transduce(model, x):\n", + " output, (h_n, c_n) = model.forward(x)\n", + " return output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1407166d", + "metadata": {}, + "outputs": [], + "source": [ + "y = transduce(model, x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "61653322", + "metadata": {}, + "outputs": [], + "source": [ + "y.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a1024adb", + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "markdown", + "id": "59f053aa", + "metadata": {}, + "source": [ + "## Manual unrolling" + ] + }, + { + "cell_type": "markdown", + "id": "59888b2b", + "metadata": {}, + "source": [ + "Recall that an RNN implements a recursive computation on sequences: Starting from an initial hidden state $h_0$, at each sequence position $i$, it consumes the previous hidden state $h_{i-1}$ and the current input $x_i$ to compute an output $y_i$ and a next hidden state $h_i$. We say that the RNN is ‘unrolled’ over a sequence of inputs.\n", + "\n", + "In both the encoder and the transducer, the unrolling happened ‘behind the scenes’ when calling the `forward()` method. In some use cases, however, we may want to have more control and do the unrolling manually. (One example is the Encoder–Decoder architecture that you will learn about in Unit 5.)\n", + "\n", + "The code in the next cell implements a function `unroll()` that computes the unrolling step-by-step, and at each position $i$ yields the next output $y_i$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fff6503e", + "metadata": {}, + "outputs": [], + "source": [ + "def unroll(model, h_0, c_0, x):\n", + " # Maintain the previous hidden state and cell state\n", + " h, c = h_0, c_0\n", + "\n", + " # Loop over all positions in the sequence\n", + " for i in range(x.shape[1]):\n", + " # Get the one-element sub-sequence of x for the current position i\n", + " x_i = x[:, i:i+1, :]\n", + "\n", + " # Do one step of the unrolling\n", + " output, (h, c) = model.forward(x_i, (h, c))\n", + "\n", + " # Yield the current output\n", + " yield output" + ] + }, + { + "cell_type": "markdown", + "id": "bee6a25c", + "metadata": {}, + "source": [ + "When calling the `unroll()` function, we need to specify an initial hidden state and cell state. The default initial states are tensors of zeros." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "665f5758", + "metadata": {}, + "outputs": [], + "source": [ + "h_0, c_0 = torch.zeros(1, 1, 2), torch.zeros(1, 1, 2)" + ] + }, + { + "cell_type": "markdown", + "id": "ed510ea8", + "metadata": {}, + "source": [ + "We can now verify that the manual unrolling produces the same output as the automatic unrolling that we used earlier:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d4bd47e", + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "for output in unroll(model, h_0, c_0, x):\n", + " print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "9d56b012", + "metadata": {}, + "source": [ + "That’s all, folks!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} -- GitLab