Add a notebook on LSTM usage patterns

b3683bff · Marco Kuhlmann · 880d07eb · b3683bff
Commit b3683bff authored 1 year ago by Marco Kuhlmann
--- a/sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb
+++ b/sessions/lstm-usage-patterns/lstm-usage-patterns.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "503fac0b",
+   "metadata": {},
+   "source": [
+    "# LSTM usage patterns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab4b84dd",
+   "metadata": {},
+   "source": [
+    "We have seen three usage patterns for recurrent neural networks: as an *encoder*, as a *transducer*, and as a *decoder*. In this notebook you will learn how to realise the *Encoder* and the *Transducer* patterns in PyTorch with an LSTM architecture. The *Decoder* pattern will be featured in Unit&nbsp;3."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f249a44c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "from torch import nn as nn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc0381b3",
+   "metadata": {},
+   "source": [
+    "## Sample input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5834a7e9",
+   "metadata": {},
+   "source": [
+    "To illustrate the two patterns, we use an input batch `x` containing a single sequence with three elements, each of which is a vector of size five."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "da5d63b5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = torch.rand(1, 3, 5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c9ff0f7",
+   "metadata": {},
+   "source": [
+    "Here is how our concrete `x` looks like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "638a1350",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff3064a0",
+   "metadata": {},
+   "source": [
+    "## Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5034b5b2",
+   "metadata": {},
+   "source": [
+    "Next, we define the LSTM model. In PyTorch, the LSTM architecture is implemented by the class [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html).\n",
+    "\n",
+    "We use an LSTM with *input_size* of&nbsp;5 and a *hidden_size* of&nbsp;2. This LSTM will process the sequence of 5-dimensional vectors in&nbsp;`x` and map each input vector to an hidden state in the form of a 2-dimensional vector.\n",
+    "\n",
+    "Per default, an [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) expects its input to have the shape (*sequence_length*, *batch_size*, *input_size*). For our purposes, it is easier to instead take the input in the form (*batch_size*, *sequence_length*, *input_size*). To get this behaviour, we set the `batch_first` argument to `True`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9a0d71bd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = nn.LSTM(5, 2, batch_first=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9af73c8d",
+   "metadata": {},
+   "source": [
+    "## Output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4100af6c",
+   "metadata": {},
+   "source": [
+    "We are now ready to feed the example input to our model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f2b1260",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output, (h_n, c_n) = model.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d527b307",
+   "metadata": {},
+   "source": [
+    "The result of the `forward()` method has two components:\n",
+    "\n",
+    "The first component is a tensor `output` that holds the hidden states computed by the LSTM, for each position of the input sequence. Consequently, the shape of `output` is (*batch_size*, *sequence_length*, *hidden_size*)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "82e99af6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f8d1312",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9319d5d",
+   "metadata": {},
+   "source": [
+    "The second component is a pair of tensors `h_n` and `c_n` which represent the final hidden state and cell state of the LSTM, respectively. These are the hidden state and cell state computed at the last position of the input sequence. Their common shape is (1, *batch_size*, *hidden_size*):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd0fffb5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "h_n.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9a60ce57",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "c_n.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ff9e1ab",
+   "metadata": {},
+   "source": [
+    "We can verify that (the only element of) `h_n` is indeed identical to the last row of `output`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b923f2d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "h_n[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba4224d0",
+   "metadata": {},
+   "source": [
+    "**🤔 Question 1: Batch size**\n",
+    "\n",
+    "> How do the concrete shapes of `output`, `h_n` and `c_n` change when you process a batch of seven sequences instead of just one?\n",
+    "\n",
+    "**🤔 Question 2: Stacked LSTMs**\n",
+    "\n",
+    "> The [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class supports stacked LSTMs with multiple layers. How do the shapes of `output`, `h_n` and `c_n` change when you define the model to have three layers? How can you then get the final state of the final layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69d255b0",
+   "metadata": {},
+   "source": [
+    "## Encoder"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bef42573",
+   "metadata": {},
+   "source": [
+    "To realise the *Encoder* pattern, we simply return the final hidden state:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "979c6c3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def encode(model, x):\n",
+    "    output, (h_n, c_n) = model.forward(x)\n",
+    "    return h_n[-1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e7bb697",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = encode(model, x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ad2ce429",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b418b972",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc970eec",
+   "metadata": {},
+   "source": [
+    "**🤔 Question 3: Bi-directional LSTMs**\n",
+    "\n",
+    "> In addition to stacking, the [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class also supports bi-directional networks. How do the shapes of `output`, `h_n` and `c_n` change in that case? How can you get the final states for the two uni-directional networks?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1df1b2d0",
+   "metadata": {},
+   "source": [
+    "## Transducer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a6043b0",
+   "metadata": {},
+   "source": [
+    "To realise a *Transducer*, we return the complete output tensor `output`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c82a322c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def transduce(model, x):\n",
+    "    output, (h_n, c_n) = model.forward(x)\n",
+    "    return output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1407166d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = transduce(model, x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61653322",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a1024adb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59f053aa",
+   "metadata": {},
+   "source": [
+    "## Manual unrolling"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59888b2b",
+   "metadata": {},
+   "source": [
+    "Recall that an RNN implements a recursive computation on sequences: Starting from an initial hidden state $h_0$, at each sequence position&nbsp;$i$, it consumes the previous hidden state $h_{i-1}$ and the current input $x_i$ to compute an output $y_i$ and a next hidden state $h_i$. We say that the RNN is ‘unrolled’ over a sequence of inputs.\n",
+    "\n",
+    "In both the encoder and the transducer, the unrolling happened ‘behind the scenes’ when calling the `forward()` method. In some use cases, however, we may want to have more control and do the unrolling manually. (One example is the Encoder–Decoder architecture that you will learn about in Unit&nbsp;5.)\n",
+    "\n",
+    "The code in the next cell implements a function `unroll()` that computes the unrolling step-by-step, and at each position&nbsp;$i$ yields the next output $y_i$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fff6503e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def unroll(model, h_0, c_0, x):\n",
+    "    # Maintain the previous hidden state and cell state\n",
+    "    h, c = h_0, c_0\n",
+    "\n",
+    "    # Loop over all positions in the sequence\n",
+    "    for i in range(x.shape[1]):\n",
+    "        # Get the one-element sub-sequence of x for the current position i\n",
+    "        x_i = x[:, i:i+1, :]\n",
+    "\n",
+    "        # Do one step of the unrolling\n",
+    "        output, (h, c) = model.forward(x_i, (h, c))\n",
+    "\n",
+    "        # Yield the current output\n",
+    "        yield output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bee6a25c",
+   "metadata": {},
+   "source": [
+    "When calling the `unroll()` function, we need to specify an initial hidden state and cell state. The default initial states are tensors of zeros."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "665f5758",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "h_0, c_0 = torch.zeros(1, 1, 2), torch.zeros(1, 1, 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed510ea8",
+   "metadata": {},
+   "source": [
+    "We can now verify that the manual unrolling produces the same output as the automatic unrolling that we used earlier:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d4bd47e",
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "for output in unroll(model, h_0, c_0, x):\n",
+    "    print(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d56b012",
+   "metadata": {},
+   "source": [
+    "That’s all, folks!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:503fac0b tags:
+
+# LSTM usage patterns
+
+%% Cell type:markdown id:ab4b84dd tags:
+
+We have seen three usage patterns for recurrent neural networks: as an *encoder*, as a *transducer*, and as a *decoder*. In this notebook you will learn how to realise the *Encoder* and the *Transducer* patterns in PyTorch with an LSTM architecture. The *Decoder* pattern will be featured in Unit&nbsp;3.
+
+%% Cell type:code id:f249a44c tags:
+
+``` python
+import torch
+
+from torch import nn as nn
+```
+
+%% Cell type:markdown id:fc0381b3 tags:
+
+## Sample input
+
+%% Cell type:markdown id:5834a7e9 tags:
+
+To illustrate the two patterns, we use an input batch `x` containing a single sequence with three elements, each of which is a vector of size five.
+
+%% Cell type:code id:da5d63b5 tags:
+
+``` python
+x = torch.rand(1, 3, 5)
+```
+
+%% Cell type:markdown id:1c9ff0f7 tags:
+
+Here is how our concrete `x` looks like:
+
+%% Cell type:code id:638a1350 tags:
+
+``` python
+x
+```
+
+%% Cell type:markdown id:ff3064a0 tags:
+
+## Model
+
+%% Cell type:markdown id:5034b5b2 tags:
+
+Next, we define the LSTM model. In PyTorch, the LSTM architecture is implemented by the class [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html).
+
+We use an LSTM with *input_size* of&nbsp;5 and a *hidden_size* of&nbsp;2. This LSTM will process the sequence of 5-dimensional vectors in&nbsp;`x` and map each input vector to an hidden state in the form of a 2-dimensional vector.
+
+Per default, an [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) expects its input to have the shape (*sequence_length*, *batch_size*, *input_size*). For our purposes, it is easier to instead take the input in the form (*batch_size*, *sequence_length*, *input_size*). To get this behaviour, we set the `batch_first` argument to `True`.
+
+%% Cell type:code id:9a0d71bd tags:
+
+``` python
+model = nn.LSTM(5, 2, batch_first=True)
+```
+
+%% Cell type:markdown id:9af73c8d tags:
+
+## Output
+
+%% Cell type:markdown id:4100af6c tags:
+
+We are now ready to feed the example input to our model:
+
+%% Cell type:code id:0f2b1260 tags:
+
+``` python
+output, (h_n, c_n) = model.forward(x)
+```
+
+%% Cell type:markdown id:d527b307 tags:
+
+The result of the `forward()` method has two components:
+
+The first component is a tensor `output` that holds the hidden states computed by the LSTM, for each position of the input sequence. Consequently, the shape of `output` is (*batch_size*, *sequence_length*, *hidden_size*).
+
+%% Cell type:code id:82e99af6 tags:
+
+``` python
+output.shape
+```
+
+%% Cell type:code id:0f8d1312 tags:
+
+``` python
+output
+```
+
+%% Cell type:markdown id:d9319d5d tags:
+
+The second component is a pair of tensors `h_n` and `c_n` which represent the final hidden state and cell state of the LSTM, respectively. These are the hidden state and cell state computed at the last position of the input sequence. Their common shape is (1, *batch_size*, *hidden_size*):
+
+%% Cell type:code id:fd0fffb5 tags:
+
+``` python
+h_n.shape
+```
+
+%% Cell type:code id:9a60ce57 tags:
+
+``` python
+c_n.shape
+```
+
+%% Cell type:markdown id:2ff9e1ab tags:
+
+We can verify that (the only element of) `h_n` is indeed identical to the last row of `output`:
+
+%% Cell type:code id:b923f2d4 tags:
+
+``` python
+h_n[0]
+```
+
+%% Cell type:markdown id:ba4224d0 tags:
+
+**🤔 Question 1: Batch size**
+
+> How do the concrete shapes of `output`, `h_n` and `c_n` change when you process a batch of seven sequences instead of just one?
+
+**🤔 Question 2: Stacked LSTMs**
+
+> The [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class supports stacked LSTMs with multiple layers. How do the shapes of `output`, `h_n` and `c_n` change when you define the model to have three layers? How can you then get the final state of the final layer?
+
+%% Cell type:markdown id:69d255b0 tags:
+
+## Encoder
+
+%% Cell type:markdown id:bef42573 tags:
+
+To realise the *Encoder* pattern, we simply return the final hidden state:
+
+%% Cell type:code id:979c6c3b tags:
+
+``` python
+def encode(model, x):
+    output, (h_n, c_n) = model.forward(x)
+    return h_n[-1]
+```
+
+%% Cell type:code id:8e7bb697 tags:
+
+``` python
+y = encode(model, x)
+```
+
+%% Cell type:code id:ad2ce429 tags:
+
+``` python
+y.shape
+```
+
+%% Cell type:code id:b418b972 tags:
+
+``` python
+y
+```
+
+%% Cell type:markdown id:bc970eec tags:
+
+**🤔 Question 3: Bi-directional LSTMs**
+
+> In addition to stacking, the [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class also supports bi-directional networks. How do the shapes of `output`, `h_n` and `c_n` change in that case? How can you get the final states for the two uni-directional networks?
+
+%% Cell type:markdown id:1df1b2d0 tags:
+
+## Transducer
+
+%% Cell type:markdown id:7a6043b0 tags:
+
+To realise a *Transducer*, we return the complete output tensor `output`.
+
+%% Cell type:code id:c82a322c tags:
+
+``` python
+def transduce(model, x):
+    output, (h_n, c_n) = model.forward(x)
+    return output
+```
+
+%% Cell type:code id:1407166d tags:
+
+``` python
+y = transduce(model, x)
+```
+
+%% Cell type:code id:61653322 tags:
+
+``` python
+y.shape
+```
+
+%% Cell type:code id:a1024adb tags:
+
+``` python
+y
+```
+
+%% Cell type:markdown id:59f053aa tags:
+
+## Manual unrolling
+
+%% Cell type:markdown id:59888b2b tags:
+
+Recall that an RNN implements a recursive computation on sequences: Starting from an initial hidden state $h_0$, at each sequence position&nbsp;$i$, it consumes the previous hidden state $h_{i-1}$ and the current input $x_i$ to compute an output $y_i$ and a next hidden state $h_i$. We say that the RNN is ‘unrolled’ over a sequence of inputs.
+
+In both the encoder and the transducer, the unrolling happened ‘behind the scenes’ when calling the `forward()` method. In some use cases, however, we may want to have more control and do the unrolling manually. (One example is the Encoder–Decoder architecture that you will learn about in Unit&nbsp;5.)
+
+The code in the next cell implements a function `unroll()` that computes the unrolling step-by-step, and at each position&nbsp;$i$ yields the next output $y_i$.
+
+%% Cell type:code id:fff6503e tags:
+
+``` python
+def unroll(model, h_0, c_0, x):
+    # Maintain the previous hidden state and cell state
+    h, c = h_0, c_0
+
+    # Loop over all positions in the sequence
+    for i in range(x.shape[1]):
+        # Get the one-element sub-sequence of x for the current position i
+        x_i = x[:, i:i+1, :]
+
+        # Do one step of the unrolling
+        output, (h, c) = model.forward(x_i, (h, c))
+
+        # Yield the current output
+        yield output
+```
+
+%% Cell type:markdown id:bee6a25c tags:
+
+When calling the `unroll()` function, we need to specify an initial hidden state and cell state. The default initial states are tensors of zeros.
+
+%% Cell type:code id:665f5758 tags:
+
+``` python
+h_0, c_0 = torch.zeros(1, 1, 2), torch.zeros(1, 1, 2)
+```
+
+%% Cell type:markdown id:ed510ea8 tags:
+
+We can now verify that the manual unrolling produces the same output as the automatic unrolling that we used earlier:
+
+%% Cell type:code id:4d4bd47e tags:
+
+``` python
+for output in unroll(model, h_0, c_0, x):
+    print(output)
+```
+
+%% Cell type:markdown id:9d56b012 tags:
+
+That’s all, folks!