From 842a97d3cec9c0ce512dcb427669fa1fd43b41cd Mon Sep 17 00:00:00 2001 From: Marco Kuhlmann <marco.kuhlmann@liu.se> Date: Tue, 9 Jan 2024 20:31:19 +0100 Subject: [PATCH] Add sessions/introduction-to-pytorch --- .../introduction-to-pytorch.ipynb | 884 ++++++++++++++++++ 1 file changed, 884 insertions(+) create mode 100644 sessions/introduction-to-pytorch/introduction-to-pytorch.ipynb diff --git a/sessions/introduction-to-pytorch/introduction-to-pytorch.ipynb b/sessions/introduction-to-pytorch/introduction-to-pytorch.ipynb new file mode 100644 index 0000000..6d923a6 --- /dev/null +++ b/sessions/introduction-to-pytorch/introduction-to-pytorch.ipynb @@ -0,0 +1,884 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction to PyTorch" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The purpose of this notebook is to introduce you to the basics of [PyTorch](https://pytorch.org), the deep learning framework we will be using for the labs. Many good introductions to PyTorch are available online. This notebook focuses on those basics that you will encounter in the labs. Beyond it, you will also need to get comfortable with the [PyTorch documentation](https://pytorch.org/docs/stable/).\n", + "\n", + "We start by importing the PyTorch module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code prints the current version of the module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(torch.__version__)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The version of PyTorch at the time of writing this notebook was 2.1.\n", + "\n", + "## Tensors\n", + "\n", + "The fundamental data structure in PyTorch is the **tensor**, a multi-dimensional matrix containing elements of a single numerical data type. Tensors are similar to *arrays* as you may know them from NumPy or MATLAB.\n", + "\n", + "### Creating tensors\n", + "\n", + "One way to create a tensor is to call the function [`torch.tensor()`](https://pytorch.org/docs/stable/generated/torch.tensor.html) on a Python list or NumPy array.\n", + "\n", + "The code in the following cell creates a 2-dimensional tensor with 4 elements." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.tensor([[0, 1], [2, 3]])\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each tensor has a *shape*, which specifies the number and sizes of its dimensions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each tensor also has a *data type* for its elements. [More information about data types](https://pytorch.org/docs/stable/tensors.html#data-types)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.dtype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When creating a tensor, you can explicitly pass the intended data type as a keyword argument:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = torch.tensor([[0, 1], [2, 3]], dtype=torch.float)\n", + "y.dtype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For many data types, there also exists a specialised constructor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = torch.FloatTensor([[0, 1], [2, 3]])\n", + "z.dtype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### More creation operations\n", + "\n", + "Create a 3D-tensor of the specified shape and filled with the scalar value zero:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.zeros(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a 3D-tensor filled with random values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a tensor with the same shape as another one, but filled with ones:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = torch.ones_like(x)\n", + "y # shape: [2, 3, 5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For a complete list of tensor-creating operations, see [Creation ops](https://pytorch.org/docs/stable/torch.html#creation-ops).\n", + "\n", + "### Embrace vectorisation!\n", + "\n", + "Iteration or “looping” is of one the most useful techniques for processing data in Python. However, you should **not loop over tensors**. Instead, try to *vectorise* any operations. Looping over tensors is slow, while vectorised operations on tensors are fast (and can be made even faster when the code runs on a GPU). To illustrate this point, let us create a 1D-tensor containing the first 1M integers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.arange(1000000)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Summing up the elements of the tensor using a loop is relatively slow:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Doing the same thing using a tensor operation is much faster:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Indexing and slicing\n", + "\n", + "To access the contents of a tensor, you can use an extended version of Python’s syntax for indexing and slicing. Essentially the same syntax is used by NumPy. For more information, see [Indexing on ndarrays](https://numpy.org/doc/stable/user/basics.indexing.html).\n", + "\n", + "To illustrate indexing and slicing, we create a 3D-tensor with random numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Index an element by a 3D-coordinate; this gives a 0D-tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[0,1,2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(If you want the result as a non-tensor, use the method [`item()`](https://pytorch.org/docs/stable/generated/torch.Tensor.item.html#torch.Tensor.item).)\n", + "\n", + "Index the second element; this gives a 2D-tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Index the second-to-last element:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[-2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Slice out the sub-tensor with elements from index 1 onwards; this gives a 3D-tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a more complex example of slicing. As in Python, the colon `:` selects all indices of a dimension." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:,:,2:4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The syntax for indexing and slicing is very powerful. For example, the same effect as in the previous cell can be obtained with the following code, which uses the ellipsis (`...`) to match all dimensions but the ones explicitly mentioned:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[...,2:4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating views\n", + "\n", + "You will sometimes want to use a tensor with a different shape than its initial shape. In these situations, you can **re-shape** the tensor or create a **view** of the tensor. The latter is preferable because views can share the same data as their base tensors and thus do not require copying.\n", + "\n", + "We create a 3D-tensor of 12 random values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 2)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a view of this tensor as a 2D-tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.view(3, 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When creating a view, the special size `-1` is inferred from the other sizes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.view(3, -1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Modifying a view affects the data in the base tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = torch.rand(2, 3, 2)\n", + "z = y.view(3, 4)\n", + "z[2, 3] = 42\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### More viewing operations\n", + "\n", + "There are a few other useful methods that create views. [More information about views](https://pytorch.org/docs/stable/tensor_view.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The [`permute()`](https://pytorch.org/docs/stable/generated/torch.permute.html) method returns a view of the base tensor with some of its dimensions permuted. In the example, we maintain the first dimension but swap the second and the third dimension:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = x.permute(0, 2, 1)\n", + "print(y)\n", + "y.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The [`unsqueeze()`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html) method returns a tensor with a dimension of size one inserted at the specified position. This is useful e.g. in the training of neural networks when you want to create a batch with just one example." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = x.unsqueeze(0)\n", + "print(y)\n", + "y.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The inverse operation to [`unsqueeze()`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html) is [`squeeze()`](https://pytorch.org/docs/stable/generated/torch.squeeze.html):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = y.squeeze(0)\n", + "print(y)\n", + "y.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Re-shaping tensors\n", + "\n", + "In some cases, you cannot create a view and need to explicitly re-shape a tensor. In particular, this happens when the data in the base tensor and the view are not in contiguous memory regions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We permute the tensor `x` to create a new tensor `y` in which the data is no longer consecutive in memory:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = x.permute(0, 2, 1)\n", + "# y = y.view(-1) # raises a runtime error\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When it is not possible to create a view of a tensor, you can explicitly re-shape it, which will *copy* the data if necessary:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = x.permute(0, 2, 1)\n", + "y = y.reshape(-1)\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Modifying a reshaped tensor *will not necessarily* change the data in the base tensor. This depends on whether the reshaped tensor is a copy of the base tensor or a view." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = torch.rand(2, 3, 2)\n", + "# z = y.permute(0, 1, 2).reshape(-1) # z is a view of y => data is shared\n", + "z = y.permute(0, 2, 1).reshape(-1) # z is a copy of y => data is not shared\n", + "z[0] = 42\n", + "y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing with tensors\n", + "\n", + "Now that you know how to create tensors and extract data from them, we can turn to actual computations on tensors.\n", + "\n", + "### Element-wise operations\n", + "\n", + "Unary mathematical operations defined on numbers can be “lifted” to tensors by applying them element-wise. This includes multiplication by a constant, exponentiation (`**`), taking roots ([`torch.sqrt()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html)), and the logarithm ([`torch.log()`](https://pytorch.org/docs/stable/generated/torch.sqrt.html))." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3)\n", + "print(x)\n", + "x * 2 # element-wise multiplication with 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Similarly, we can apply binary mathematical operations to tensors, as long as they have the same shape. For example, the Hadamard product of two tensors $X$ and $Y$ is the tensor $X \\odot Y$ obtained by the element-wise multiplication of the elements of $X$ and $Y$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3)\n", + "y = torch.rand(2, 3)\n", + "torch.mul(x, y) # shape: [2, 3]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Hadamard product can be written more succinctly as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x * y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Matrix product\n", + "\n", + "When computing the matrix product between two tensors $X$ and $Y$, the sizes of the last dimension of $X$ and the first dimension of $Y$ must match. The shape of the resulting tensor is the concatenation of the shapes of $X$ and $Y$, with the last dimension of $X$ and the first dimension of $Y$ removed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3)\n", + "y = torch.rand(3, 5)\n", + "torch.matmul(x, y) # shape: [2, 5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The matrix product can be written more succinctly as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x @ y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sum and argmax\n", + "\n", + "Let us define a tensor of random numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3, 5)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You have already seen that we can compute the sum of a tensor:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "torch.sum(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is a second form of the sum operation where we can specify the dimension along which the sum should be computed. This will return a tensor with the specified dimension removed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "torch.sum(x, dim=0) # shape: [3, 5]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "torch.sum(x, dim=1) # shape: [2, 5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same idea also applies to the operation [`torch.argmax()`](https://pytorch.org/docs/stable/generated/torch.argmax.html), which returns the index of the component with the maximal value along the specified dimension." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "torch.argmax(x) # index of the highest component, numbered in consecutive order" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "torch.argmax(x, dim=0) # index of the highest component along the first dimension" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Concatenating tensors\n", + "\n", + "A list or tuple of tensors can be combined into one long tensor by concatenation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 3)\n", + "y = torch.rand(3, 3)\n", + "z = torch.cat((x, y))\n", + "print(z)\n", + "z.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also concatenate along a specific dimension:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(2, 2)\n", + "y = torch.rand(2, 2)\n", + "print(x)\n", + "print(y)\n", + "print(torch.cat((x, y), dim=0)) # shape: [4, 2]\n", + "print(torch.cat((x, y), dim=1)) # shape: [2, 4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Broadcasting\n", + "\n", + "The term *broadcasting* describes how PyTorch treats tensors with different shapes. In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data). In many situations, this can avoid explicit looping. \n", + "\n", + "In the simplest case, two tensors have the same shapes. This is the case for the matrix `x @ W` and the bias vector `b` in the linear model below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = torch.rand(1, 2)\n", + "W = torch.rand(2, 3)\n", + "b = torch.rand(1, 3)\n", + "z = x @ W # shape: [1, 3]\n", + "z = z + b # shape: [1, 3]\n", + "print(z)\n", + "z.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now suppose that we do not have a single input `x` but a whole batch (a matrix) of inputs `X`. Watch what happens when adding the bias vector `b`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X = torch.rand(5, 2)\n", + "Z = X @ W # shape: [5, 3]\n", + "Z = Z + b # shape: [5, 3] Broadcasting happens here!\n", + "print(Z)\n", + "Z.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the example, broadcasting expands the shape of `b` from $[1, 3]$ into $[5, 3]$. The matrix `Z` is formed by effectively adding `b` *to each row* of `X`. However, this is not implemented by a Python loop but happens implicitly through broadcasting.\n", + "\n", + "PyTorch uses the same broadcasting semantics as NumPy. [More information about broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)\n", + "\n", + "## Final note\n", + "\n", + "There is a lot more to learn about PyTorch, but after working through this notebook, you should be in a good position to take on the labs. Have a look at the [PyTorch documentation](https://pytorch.org/docs/stable/) for further details and more examples." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- GitLab