diff --git a/labs/l1/Our_Work/NLP-L1.ipynb b/labs/l1/Our_Work/NLP-L1.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..09a319211ab42d891ced0c96319f89bb7f229e9e
--- /dev/null
+++ b/labs/l1/Our_Work/NLP-L1.ipynb
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# L1: Word representations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this lab you will implement the **skip-gram model with negative sampling (SGNS)** from Lecture&nbsp;1.4, and use it to train word embeddings on the text of the [Simple English Wikipedia](https://simple.wikipedia.org/wiki/Main_Page).\n",
+    "\n",
+    "⚠️ The dataset for this lab contains 18M tokens. This is very little as far as word embedding datasets are concerned – for example, the original word2vec model was pre-trained on 100B tokens. In spite of this, you will need to think about efficiency when processing the data and training your models. In particular, wherever possible you should use iterators rather than lists, and vectorize operations (using [NumPy](https://numpy.org) or [PyTorch](https://pytorch.org)) as much as possible."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The data for this lab comes as a bz2-compressed plain text file. It consists of 1,163,769 sentences, with one sentence per line and tokens separated by spaces. The cell below contains a wrapper class `SimpleWikiDataset` that can be used to iterate over the sentences (lines) in the text file. On the Python side of things, each sentence is represented as a list of tokens (strings)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bz2\n",
+    "\n",
+    "class SimpleWikiDataset():\n",
+    "    \n",
+    "    def __init__(self, max_sentences=None):\n",
+    "        self.max_sentences = max_sentences\n",
+    "    \n",
+    "    def __iter__(self):\n",
+    "        with bz2.open('simplewiki.txt.bz2', 'rt') as sentences:\n",
+    "            for i, sentence in enumerate(sentences):\n",
+    "                if self.max_sentences and i >= self.max_sentences:\n",
+    "                    break\n",
+    "                yield sentence.split()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using this class, we define two variants of the dataset: the full dataset and a minimal version with the first 1% of the sentences in the full dataset. The latter will be useful to test code without running it on the full dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Dataset with all sentences (N = 1,163,769)\n",
+    "full_dataset = SimpleWikiDataset()\n",
+    "\n",
+    "# Minimal dataset\n",
+    "mini_dataset = SimpleWikiDataset(max_sentences=11638)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The next code cell defines a generator function that allows you to iterate over all tokens in a dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def tokens(sentences):\n",
+    "    for sentence in sentences:\n",
+    "        for token in sentence:\n",
+    "            yield token"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To illustrate how to use this function, here is code that prints the number of tokens in the full dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "17594885\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(sum(1 for t in tokens(full_dataset)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Problem 1: Build the vocabulary and frequency table"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your first task is to construct the embedding **vocabulary** – the set of unique words that will receive an embedding. Because you will eventually need to map words to vector dimensions, you will represent the vocabulary as a dictionary that maps words (strings) to a contiguous range of integers.\n",
+    "\n",
+    "Along with the vocabulary, you will also construct the **frequency table**, that is, the table that holds the absolute frequencies (counts) in the data, for all words in your vocabulary. This will simply be an array of integers, indexed by the word ids in the vocabulary."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To construct the vocabulary and the frequency table, complete the skeleton code in the cell below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def make_vocab_and_counts(sentences, min_count=5):\n",
+    "    dictt = {}\n",
+    "    word_freq = np.array([])\n",
+    "    i = 0\n",
+    "\n",
+    "    for sentence in sentences:\n",
+    "        for token in sentence:\n",
+    "            if token in dictt:\n",
+    "                dictt[token] -= 1\n",
+    "            else:\n",
+    "                dictt[token] = -1\n",
+    "\n",
+    "    for sentence in sentences:\n",
+    "        for token in sentence:\n",
+    "            if token in dictt and dictt[token] <= -min_count:\n",
+    "                word_freq=np.append(word_freq,[-dictt[token]])\n",
+    "                #print(token)\n",
+    "                dictt[token] = i\n",
+    "                i += 1\n",
+    "            elif token in dictt and dictt[token] > -min_count and dictt[token] < 0:\n",
+    "                dictt.pop(token, 0)\n",
+    "    \n",
+    "    return dictt, word_freq"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your code should comply with the following specification:\n",
+    "\n",
+    "**make_vocab_and_counts** (*sentences*, *min_count* = 5)\n",
+    "\n",
+    "> Reads from an iterable of *sentences* (lists of string tokens) and returns a pair *vocab*, *counts* where *vocab* is a dictionary representing the vocabulary and *counts* is a 1D-[ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) with the absolute frequencies (counts) of the words in the vocabulary. The dictionary *vocab* maps words to a contiguous range of integers starting at&nbsp;0. In the *counts* array, the entry at index $i$ is the count of that word in *vocab* which maps to $i$. Words that occur less than *min_count* times are excluded from the vocabulary."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🤞 Test your code\n",
+    "\n",
+    "To test your code, print the sizes of the vocabularies constructed from the two datasets, as well as the count totals. The correct vocabulary size for the minimal dataset is 3,231; for the full dataset, the correct vocabulary size is 73,339. The correct totals are 155,818 for the minimal dataset and 17,297,355 for the full dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "73339"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "_, count = make_vocab_and_counts(full_dataset)\n",
+    "len(count)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Problem 2: Preprocess the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your next task is to preprocess the training data. This involves the following:\n",
+    "\n",
+    "* Discard words that are not in the vocabulary\n",
+    "* Map each word to its vocabulary id\n",
+    "* Randomly discard words according to the subsampling strategy covered in Lecture&nbsp;1.4\n",
+    "* Discard sentences that have become empty\n",
+    "\n",
+    "As a reminder, the subsampling strategy involves discarding tokens $w$ with probability\n",
+    "\n",
+    "$$\n",
+    "P(w) = \\max (0, 1-\\sqrt{tN/\\#(w)})\n",
+    "$$\n",
+    "\n",
+    "where $\\#(w)$ is the count of $w$, $N$ is the total number of counts, and $t$ is the chosen threshold (default value: 0.001)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The cell below contains skeleton code for a generator function `preprocess`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#preprocess\n",
+    "import random\n",
+    "def preprocess(vocab, counts, sentences, threshold=0.001):\n",
+    "    N = np.sum(np.fromiter(counts,dtype=np.dtype((int,1))))\n",
+    "    for sentence in sentences:\n",
+    "        new_sentence = np.array([]) #Not sure this should be an npArray or a normal one\n",
+    "        for token in sentence:\n",
+    "            if token in vocab and random.random() > 1-threshold*N / counts[vocab[token]] :\n",
+    "                new_sentence = np.append(new_sentence, vocab[token])\n",
+    "        if len(new_sentence) > 0:\n",
+    "            yield new_sentence"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Extend this skeleton code into a function that implements the preprocessing. Your code should comply with the following specification:\n",
+    "\n",
+    "**preprocess** (*vocab*, *counts*, *sentences*, *threshold* = 0.001)\n",
+    "\n",
+    "> Reads from an iterable of *sentences* (lists of string tokens) and yields the preprocessed sentences as non-empty lists of word ids (integers). Words not in *vocab* are discarded. The remaining words are randomly discarded according to the subsampling strategy with the given *threshold*. In the non-empty sentences, each token is replaced by its id in the vocabulary.\n",
+    "\n",
+    "**⚠️ Please observe** that your function should *yield* the preprocessed sentences, not return a list with all of them. That is, we ask you to write a *generator function*. If you have not worked with generators and iterators before, now is a good time to read up on them. [More information about generators](https://wiki.python.org/moin/Generators)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🤞 Test your code\n",
+    "\n",
+    "Test your code by comparing the total number of tokens in the preprocessed version of each dataset with the corresponding number for the original data. The former should be ca. 59% of the latter for the minimal dataset, and ca. 69% for the full dataset. The exact percentage will vary slightly because of the randomness in the sampling. You may want to repeat your computation several times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_1725370/2744212062.py:4: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  print(np.sum(np.fromiter(counts,dtype=np.dtype((int,1)))))\n",
+      "/tmp/ipykernel_1725370/2744212062.py:5: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  N = np.sum(np.fromiter(counts,dtype=np.dtype((int,1))))\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "155818\n",
+      "85873\n"
+     ]
+    }
+   ],
+   "source": [
+    "#test\n",
+    "vocab, counts = make_vocab_and_counts(mini_dataset)\n",
+    "summ=0\n",
+    "for i in preprocess(vocab, counts, mini_dataset, threshold=0.001):\n",
+    "    summ+=len(i)\n",
+    "print(summ)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Problem 3: Generate the training examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your next task is to translate the preprocessed sentences into training examples for the skip-gram model: both *positive examples* (target word–context word pairs actually observed in the data) and *negative examples* (pairs randomly sampled from a noise distribution).\n",
+    "\n",
+    "**⚠️ We expect that solving this problem will take you the longest time in this lab.**\n",
+    "\n",
+    "### General strategy\n",
+    "\n",
+    "The general plan for solving this problem is to implement a generator function that traverses the preprocessed sentences, at each position of the text samples a window, and then extracts all positive examples from it. For each positive example, the function also generates $k$ negative examples, where $k$ is a hyperparameter. Finally, all examples (positive and negative) are combined into the tensor representation described below.\n",
+    "\n",
+    "### Representation\n",
+    "\n",
+    "How should you represent a batch of training examples? Writing $B$ for the batch size, the obvious choice would be to represent the inputs as a matrix of shape $[B, 2]$ and the output labels (positive/negative) as a vector of length $B$. This representation would be quite wasteful on the input side, however, as each target word (index) from a positive example would have to be repeated in all negative samples. For example ($k=3$):"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "tensor([[34,  237],    # positive example 1\n",
+    "        [34, 2561],    # negative example 1.1\n",
+    "        [34,   39],    # negative example 1.2\n",
+    "        [34,  903],    # negative example 1.3\n",
+    "        [34, 2036],    # positive example 2\n",
+    "        [34, 2132],    # negative example 2.1\n",
+    "        [34,  576],    # negative example 2.2\n",
+    "        [34, 2437]])   # negative example 2.3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here you will use a different representation: First, instead of a single input batch, there will be a *pair* of input batches – a vector for the target words and a matrix for the context words. If the target word vector has length $B$, the context word matrix has shape $[B, 1+k]$. The $i$th element of the target word vector is the target word for *all* context words in the $i$th row of the context word matrix: the first column of that row comes from a positive example, the remaining columns come from the $k$ negative samples. Accordingly, the batch with the output labels will be a matrix of the same shape as the context word matrix, with its first column set to&nbsp;1 and its remaining columns set to&nbsp;0. Corresponding to the example above:"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "# input batch component 1: target word vector\n",
+    "tensor([34, 34])\n",
+    "\n",
+    "# input batch component 2: context word matrix\n",
+    "tensor([[237, 2561, 39, 903], [2036, 2132, 576, 2437]])\n",
+    "\n",
+    "# output labels\n",
+    "tensor([[1, 0, 0, 0], [1, 0, 0, 0]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the present problem, you will only be concerned with the two input batches; the output batch will be constructed in the training procedure. In fact, for a fixed batch size $B$, that batch is always exactly the same, so you will only have to build it once.\n",
+    "\n",
+    "### Negative sampling\n",
+    "\n",
+    "Recall from Lecture&nbsp;1.4 that the probability of a word $c$ to be selected as the context word in a negative sample is proportional to its exponentiated count $\\#(c)^\\alpha$, where $\\alpha$ is a hyperparameter (default value: 0.75).\n",
+    "\n",
+    "To implement negative sampling from this distribution, you can follow a standard recipe: Start by pre-computing an array containing the *cumulative sums* of the exponentiated counts. Then, generate a random cumulative count $n$, and find that index in the pre-computed array at which $n$ should be inserted to keep the array sorted. That index identifies the sampled context word.\n",
+    "\n",
+    "All operations in this recipe can be implemented efficiently in PyTorch; the relevant functions are [`torch.cumsum`](https://pytorch.org/docs/stable/generated/torch.cumsum.html) and [`torch.searchsorted`](https://pytorch.org/docs/stable/generated/torch.searchsorted.html). For optimal efficiency, you should sample all $B \\times k$ negative examples in a batch at once."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is skeleton code for this problem:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def training_examples(vocab, counts, sentences, window=5, num_ns=5, batch_size=1<<19, ns_exponent=0.75):\n",
+    "    # TODO: Replace the next line with your own code\n",
+    "    yield torch.zeros(batch_size).long(), torch.zeros(batch_size, 1 + num_ns).long()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "def training_examples(vocab, counts, sentences, window=5, num_ns=5, batch_size=1<<19, ns_exponent=0.75):\n",
+    "    target = torch.zeros(batch_size).long()\n",
+    "    context = torch.zeros(batch_size, 1 + num_ns).long()\n",
+    "    exp_count = []\n",
+    "    for i in counts:\n",
+    "        exp_count.append(i**ns_exponent)\n",
+    "    com_sums = torch.cumsum(torch.from_numpy(np.array(exp_count)),dim=0)\n",
+    "    max_sum = com_sums[-1]\n",
+    "    pos = 0\n",
+    "    for sentence in sentences:\n",
+    "        for k in range(0,len(sentence)):\n",
+    "            wind = random.randint(1,window)\n",
+    "            for i in range(-wind,wind):\n",
+    "                if i != 0 and k+1 >= 0 and k+i < len(sentence):\n",
+    "                    target.index_copy_(0,torch.tensor([pos]),torch.tensor([sentence[k]]).long())\n",
+    "                    #TODO Readd negative, and fix last batch\n",
+    "                    #target[pos] = sentence[k]\n",
+    "                    #context[pos][0] = sentence[k+i]\n",
+    "                    #for u in range(0, num_ns):\n",
+    "                    #    context[pos][u+1] = torch.searchsorted(random.random()*max_sum)\n",
+    "                    pos += 1\n",
+    "                    if pos == batch_size:\n",
+    "                        yield target, context\n",
+    "                        pos = 0\n",
+    "                        target = torch.zeros(batch_size).long()\n",
+    "                        context =  torch.zeros(batch_size, 1 + num_ns).long()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your code should comply with the following specification:\n",
+    "\n",
+    "**training_examples** (*vocab*, *counts*, *sentences*, *window* = 5, *num_ns* = 5, *batch_size* = 524,288, *ns_exponent*=0.75)\n",
+    "\n",
+    "> Reads from an iterable of *sentences* (lists of string tokens), preprocesses them using the function implemented in Problem&nbsp;2, and then yields pairs of input batches for gradient-based training, represented as described above. Each batch contains *batch_size* positive examples. The parameter *window* specifies the maximal distance between a target word and a context word in a positive example; the actual window size around any given target word is sampled uniformly at random. The parameter *num_ns* specifies the number of negative samples per positive sample. The parameter *ns_exponent* specifies the exponent in the negative sampling (called $\\alpha$ above)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🤞 Test your code\n",
+    "\n",
+    "To test your code, compare the total number of positive samples (across all batches) to the total number of tokens in the (un-preprocessed) minimal dataset. The ratio between these two values should be ca. 2.64. If you can spare the time, you can make the same comparison on the full dataset; here, the expected ratio is 3.25. As before, the numbers may vary slightly because of randomness, so you may want to run the comparison more than once."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_1725370/3470889482.py:4: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  N = np.sum(np.fromiter(counts,dtype=np.dtype((int,1))))\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "384000\n"
+     ]
+    }
+   ],
+   "source": [
+    "summ = 0\n",
+    "\n",
+    "vocab, counts = make_vocab_and_counts(mini_dataset)\n",
+    "preProc = preprocess(vocab, counts, mini_dataset, threshold=0.001)\n",
+    "for target, context in training_examples(vocab, counts, preProc, window=5, num_ns=5, batch_size=1000, ns_exponent=0.75):\n",
+    "    summ += len(target)\n",
+    "print(summ)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " Problem 4: Implement the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now it is time to implement the skip-gram model as such. The cell below contains skeleton code for this. As you will recall from Lecture&nbsp;1.4, the core of the implementation is formed by two embedding layers: one for the target word representations, and one for the context word representations. Your task is to implement the missing `forward()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "class SGNSModel(nn.Module):\n",
+    "    \n",
+    "    def __init__(self, vocab, embedding_dim):\n",
+    "        super().__init__()\n",
+    "        self.vocab = vocab\n",
+    "        self.w = nn.Embedding(len(vocab), embedding_dim)\n",
+    "        self.c = nn.Embedding(len(vocab), embedding_dim)\n",
+    "    \n",
+    "    def forward(self, w, c):\n",
+    "        # TODO: Replace the next line with your own code\n",
+    "        return torch.zeros_like(c, dtype=torch.float, requires_grad=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your implementation of the `forward()` method should comply with the following specification:\n",
+    "\n",
+    "**forward** (*self*, *w*, *c*)\n",
+    "\n",
+    "> The input to this methods is a tensor *w* with target words of shape $[B]$ and a tensor *c* with context words of shape $[B, 1+k]$, where $B$ is the batch size and $k$ is the number of negative samples. The two tensors are structured as explained for Problem&nbsp;3. The output of the method is a tensor $D$ of shape $[B, k+1]$ where entry $D_{ij}$ is the dot product between the embedding vector for the $i$th target word and the embedding vector for the context word in row $i$, column $j$.\n",
+    "\n",
+    "**💡 Hint:** To compute a dot product $x^\\top y$, you can first compute the Hadamard product $z = x \\odot y$ and then sum up the elements of $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🤞 Test your code\n",
+    "\n",
+    "Test your code by creating an instance of the model, and check that `forward` returns the expected result on random input tensors *w* and *c*. To help you, the following function will return a random example from the first 100 examples produced by `training_examples`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: Test your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def random_example(vocab, counts, sentences):\n",
+    "    skip = np.random.randint(100)\n",
+    "    for i, example in enumerate(training_examples(vocab, counts, sentences, num_ns=1, batch_size=5)):\n",
+    "        if i >= skip:\n",
+    "            break\n",
+    "    return example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Problem 5: Train the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you have a working model, it is time to train it. The training loop for the skip-gram model will be very similar to the prototypical training loop that you saw in Lecture&nbsp;0.6, with two things to note:\n",
+    "\n",
+    "First, instead of categorical cross entropy, you will use binary cross entropy. Just like the standard implementation of the softmax classifier, the skip-gram model does not include a final non-linearity, so you should use [`binary_cross_entropy_with_logits()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.functional.binary_cross_entropy_with_logits.html).\n",
+    "\n",
+    "The second thing to note is that you will have to create the tensor with the output labels, as explained already in Problem&nbsp;3. This should be a matrix of size $[B, 1+k]$ whose first column contains $1$s and whose remaining columns contains $0$s."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is skeleton code for the training loop, including default values for the most important hyperparameters:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "\n",
+    "def train(sentences, embedding_dim=50, window=5, num_ns=5, batch_size=1<<19, n_epochs=1, lr=1e-1):\n",
+    "    # Create the vocabulary and the counts\n",
+    "    vocab, counts = make_vocab_and_counts(sentences)\n",
+    "    \n",
+    "    # Initialize the model\n",
+    "    model = SGNSModel(vocab, embedding_dim)\n",
+    "    \n",
+    "    # Initialize the optimizer. Here we use Adam rather than plain SGD\n",
+    "    optimizer = optim.Adam(model.parameters(), lr=lr)\n",
+    "    \n",
+    "    # TODO: Add your code here\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To show you how `train` is meant to be used, the code in the next cell trains a model on the minimal dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = train(mini_dataset, n_epochs=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🤞 Test your code\n",
+    "\n",
+    "Test your implementation of the training loop by training a model on the minimal dataset. This should only take a few seconds. You will not get useful word vectors, but you will be able to see whether your code runs without errors.\n",
+    "\n",
+    "Once you have passed this test, you can train a model on the full dataset. Print the loss to check that the model is actually learning; if the loss is not decreasing, try to find the problem before wasting time (and energy) on useless training.\n",
+    "\n",
+    "Training on the full dataset will take some time – on a CPU, you should expect 10–40 minutes per epoch, depending on hardware. To give you some guidance: The total number of positive examples is approximately 58M, and the batch size is chosen so that each batch contains roughly 10% of these examples. To speed things up, you can train using a GPU; our reference implementation runs in less than 2 minutes per epoch on [Colab](http://colab.research.google.com)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: Train your model on the full dataset here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Problem 6: Analyse the embeddings (reflection)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have a trained model, you will probably be curious to see what it has learned. You can inspect your embeddings using the [Embedding Projector](http://projector.tensorflow.org). To that end, click on the ‘Load’ button, which will open up a dialogue with instructions for how to upload embeddings from your computer.\n",
+    "\n",
+    "You will need to upload two tab-separated files. To create them, you can use the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def save_model(model):\n",
+    "    # Extract the embedding vectors as a NumPy array\n",
+    "    embeddings = model.w.weight.detach().numpy()\n",
+    "    \n",
+    "    # Create the word–vector pairs\n",
+    "    items = sorted((i, w) for w, i in model.vocab.items())\n",
+    "    items = [(w, e) for (i, w), e in zip(items, embeddings)]\n",
+    "    \n",
+    "    # Write the embeddings and the word labels to files\n",
+    "    with open('vectors.tsv', 'wt') as fp1, open('metadata.tsv', 'wt') as fp2:\n",
+    "        for w, e in items:\n",
+    "            print('\\t'.join('{:.5f}'.format(x) for x in e), file=fp1)\n",
+    "            print(w, file=fp2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Take some time to explore the embedding space. In particular, inspect the local neighbourhoods of words that you are curious about, say the 10 closest neighbours. Document your exploration in a short reflection piece (ca. 150&nbsp;words). Respond to the following prompts:\n",
+    "\n",
+    "* Which words did you try? Which results did you get? Did you do anything else than inspecting local neighbourhoods?\n",
+    "* Based on what you know about word embeddings, did you expect your results? How do you explain them?\n",
+    "* What did you learn? How, exactly, did you learn it? Why does this learning matter?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*TODO: Enter your text here*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "👍 Well done!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}