Remove some tests

e5580ada · Marco Kuhlmann · e70ded32 · e5580ada
Commit e5580ada authored 4 years ago by Marco Kuhlmann
--- a/labs/l3/NLP-L3.ipynb
+++ b/labs/l3/NLP-L3.ipynb
@@ -183,19 +183,6 @@
    "Test your code by computing the accuracy on the development set of a trivial tagger that tags each word as a noun. The expected value is 16.69%."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def test1():\n",
-    "    tagger1 = ConstantTagger('NOUN')  # will tag each word as a noun\n",
-    "    print('{:.4f}'.format(accuracy(tagger1, dev_data)))\n",
-    "\n",
-    "test1()"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -245,19 +232,6 @@
    "Test your implementation by computing the accuracy of the baseline tagger on the development data. The expected value is 85.61%."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def test2():\n",
-    "    tagger2 = train_baseline(train_data)\n",
-    "    print('{:.4f}'.format(accuracy(tagger2, dev_data)))\n",
-    "\n",
-    "test2()"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -308,19 +282,6 @@
    "Test your implementation by computing the total number of unique words and tags in the training data (including the pseudowords). The expected values are 19,674&nbsp;words and 18&nbsp;tags."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def test3():\n",
-    "    vocab_words, vocab_tags = make_vocabs(train_data)\n",
-    "    print('words: {}, tags: {}'.format(len(vocab_words), len(vocab_tags)))\n",
-    "\n",
-    "test3()"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -520,53 +481,6 @@
    "> Iterates through the given *gold_data* (an iterable of tagged sentences), encodes it into word ids and tag ids using the specified vocabularies *vocab_words* and *vocab_tags*, and then yields batches of training examples for gradient-based training. Each batch contains *batch_size* examples, except for the last batch, which may contain fewer examples. Each example in the batch is created by a call to the `featurize` function of the *tagger*."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from collections import Counter\n",
-    "\n",
-    "def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100, shuffle=False):\n",
-    "    bx = []\n",
-    "    by = []\n",
-    "    for tagged_sentence in gold_data:\n",
-    "        # Separate the words and the gold-standard tags\n",
-    "        words, gold_tags = zip(*tagged_sentence)\n",
-    "\n",
-    "        # Encode words and tags using the vocabularies\n",
-    "        words = [vocab_words.get(w, UNK_IDX) for w in words]\n",
-    "        gold_tags = [vocab_tags[t] for t in gold_tags]\n",
-    "\n",
-    "        # Simulate a run of the tagger over the sentence, collecting training examples\n",
-    "        pred_tags = []\n",
-    "        for i, gold_tag in enumerate(gold_tags):\n",
-    "            bx.append(tagger.featurize(words, i, pred_tags))\n",
-    "            by.append(gold_tag)\n",
-    "            if len(bx) >= batch_size:\n",
-    "                bx = torch.stack(bx)\n",
-    "                by = torch.LongTensor(by)\n",
-    "                if shuffle:\n",
-    "                    random_indices = torch.randperm(len(bx))\n",
-    "                    yield bx[random_indices], by[random_indices]\n",
-    "                else:\n",
-    "                    yield bx, by\n",
-    "                bx = []\n",
-    "                by = []\n",
-    "            pred_tags.append(gold_tag)    # teacher forcing!\n",
-    "\n",
-    "    # Check whether there is an incomplete batch\n",
-    "    if bx:\n",
-    "        bx = torch.stack(bx)\n",
-    "        by = torch.LongTensor(by)\n",
-    "        if shuffle:\n",
-    "            random_indices = torch.randperm(len(bx))\n",
-    "            yield bx[random_indices], by[random_indices]\n",
-    "        else:\n",
-    "            yield bx, by"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},

 %% Cell type:markdown id: tags:
 # L3: Part-of-speech tagging
 %% Cell type:markdown id: tags:
 Part-of-speech tagging is the task of labelling the words (tokens) of a sentence with parts-of-speech such as noun, adjective, and verb. In this lab you will implement the simple, autoregressive fixed-window tagger that was presented in Lecture&nbsp;3.2.
 %% Cell type:markdown id: tags:
 ## The data set
 %% Cell type:markdown id: tags:
 The data set for the lab is the English Web Treebank from the [Universal Dependencies Project](http://universaldependencies.org), a corpus containing more than 16,000 sentences (254,000&nbsp;tokens) annotated with, among other things, parts-of-speech. The Universal Dependencies Project distributes its data in the [CoNLL-U format](https://universaldependencies.org/format.html), but for this lab we have converted the data into a simpler format: words and their part-of-speech tags are separated by tabs, sentences are separated by empty lines. The code in the next cell defines a container class for data with this format.
 %% Cell type:code id: tags:
 ``` python
 class Dataset():
    def __init__(self, filename):
        self.filename = filename
    def __iter__(self):
        tmp = []
        with open(self.filename, 'rt', encoding='utf-8') as lines:
            for line in lines:
                line = line.rstrip()
                if line:
                    tmp.append(tuple(line.split('\t')))
                else:
                    yield tmp
                    tmp = []
 ```
 %% Cell type:markdown id: tags:
 We load the training data and the development data for this lab:
 %% Cell type:code id: tags:
 ``` python
 train_data = Dataset('train.txt')
 dev_data = Dataset('dev.txt')
 ```
 %% Cell type:markdown id: tags:
 Both data sets consist of **tagged sentences**. On the Python side of things, a tagged sentence is represented as a list of string pairs, where the first component of each pair represents a word token and the second component represents the word’s tag. The possible tags are listed and exemplified in the [Annotation Guidelines](http://universaldependencies.org/u/pos/all.html) of the Universal Dependencies Project. Run the next code cell to see an example of a tagged sentence.
 %% Cell type:code id: tags:
 ``` python
 list(train_data)[42]
 ```
 %% Cell type:markdown id: tags:
 ## Tagger interface
 %% Cell type:markdown id: tags:
 The tagger that you will implement in this lab follows a simple interface with just one method:
 %% Cell type:code id: tags:
 ``` python
 class Tagger(object):
    def predict(self, sentence):
        raise NotImplementedError
 ```
 %% Cell type:markdown id: tags:
 The single method of this interface has the following specification:
 **predict** (*self*, *sentence*)
 > Returns the list of predicted tags (a list of strings) for a single *sentence* (a list of string tokens).
 One trivial implementation of this interface is a tagger that always predicts the same tag for every word, independently of the input:
 %% Cell type:code id: tags:
 ``` python
 class ConstantTagger(Tagger):
    def __init__(self, the_tag):
        self.the_tag = the_tag
    def predict(self, words):
        return [self.the_tag] * len(words)
 ```
 %% Cell type:markdown id: tags:
 ## Problem 1: Implement an evaluation function
 %% Cell type:markdown id: tags:
 Your first task is to implement a function that computes the accuracy of a tagger on gold-standard data.
 %% Cell type:code id: tags:
 ``` python
 def accuracy(tagger, gold_data):
    # TODO: Replace the next line with your own code
    return 0.0
 ```
 %% Cell type:markdown id: tags:
 Your implementation should conform to the following specification:
 **accuracy** (*tagger*, *gold_data*)
 > Computes the accuracy of the *tagger* on the gold-standard data *gold_data* (an iterable of tagged sentences) and returns it as a float. Recall that the accuracy is defined as the percentage of tokens to which the tagger assigns the correct tag (as per the gold standard).
 %% Cell type:markdown id: tags:
 ### 🤞 Test your code
 Test your code by computing the accuracy on the development set of a trivial tagger that tags each word as a noun. The expected value is 16.69%.
-%% Cell type:code id: tags:
-``` python
-def test1():
-    tagger1 = ConstantTagger('NOUN')  # will tag each word as a noun
-    print('{:.4f}'.format(accuracy(tagger1, dev_data)))
-test1()
-```
 %% Cell type:markdown id: tags:
 ## Problem 2: Implement a baseline
 %% Cell type:markdown id: tags:
 Before you start working on the tagger as such, we ask you to first implement a simple baseline:
 > Tag each input word with the most frequent tag for that word in the training data. If an input word does not occur in the training data, tag it with the overall most frequent tag in the training data. Break ties by choosing that tag which comes first in the alphabetical order.
 To implement the baseline, you need to implement both a class `BaselineTagger` and a function `train_baseline`. A `BaselineTagger` has two fields: a dictionary mapping each word in the training data to the most frequent tag for that word, and a string representing the fallback tag (overall most frequent tag in the training data). Both of these fields are set in the `train_baseline` function.
 %% Cell type:code id: tags:
 ``` python
 class BaselineTagger(Tagger):
    def __init__(self):
        self.most_frequent = {}
        self.fallback = None
    def predict(self, words):
        # TODO: Replace the next line with your own code
        super().predict(words)
 def train_baseline(train_data):
    # TODO: Replace the next line with your own code
    return BaselineTagger()
 ```
 %% Cell type:markdown id: tags:
 ### 🤞 Test your code
 Test your implementation by computing the accuracy of the baseline tagger on the development data. The expected value is 85.61%.
-%% Cell type:code id: tags:
-``` python
-def test2():
-    tagger2 = train_baseline(train_data)
-    print('{:.4f}'.format(accuracy(tagger2, dev_data)))
-test2()
-```
 %% Cell type:markdown id: tags:
 ## Problem 3: Create the vocabularies
 %% Cell type:markdown id: tags:
 As in previous labs, you will need an explicit representation of your vocabulary. Here we actually have two vocabularies: one for the words and one for the tags. Both should be represented as dictionaries that map words/tags to a contiguous range of integers, starting at zero.
 The next cell contains skeleton code for a function `make_vocabs` that constructs the two vocabularies from gold-standard data. The code cell also defines a name for the ‘unknown word’ (`UNK`) and for an additional pseudoword that you will use as a placeholder for undefined values (`PAD`).
 %% Cell type:code id: tags:
 ``` python
 PAD = '<pad>'
 UNK = '<unk>'
 def make_vocabs(gold_data):
    # TODO: Replace the next line with your own code
    return {}, {}
 ```
 %% Cell type:markdown id: tags:
 Complete the code according to the following specification:
 **make_vocabs** (*gold_data*)
 > Returns a pair of dictionaries mapping the unique words and tags in the gold-standard data *gold_data* (an iterable over tagged sentences) to contiguous ranges of integers starting at zero. The word dictionary contains the pseudowords `PAD` (index&nbsp;0) and `UNK` (index&nbsp;1); the tag dictionary contains `PAD` (index&nbsp;0).
 %% Cell type:markdown id: tags:
 ### 🤞 Test your code
 Test your implementation by computing the total number of unique words and tags in the training data (including the pseudowords). The expected values are 19,674&nbsp;words and 18&nbsp;tags.
-%% Cell type:code id: tags:
-``` python
-def test3():
-    vocab_words, vocab_tags = make_vocabs(train_data)
-    print('words: {}, tags: {}'.format(len(vocab_words), len(vocab_tags)))
-test3()
-```
 %% Cell type:markdown id: tags:
 ## Problem 4: Fixed-window tagger
 %% Cell type:markdown id: tags:
 Your main task in this lab is to implement a complete, autoregressive part-of-speech tagger based on the fixed-window architecture. This implementation has four parts: the fixed-window model; a tagger that uses the fixed-window model to make predictions; a function that generates training examples for the tagger; and the training function.
 **⚠️ We expect that solving this problem will take you the longest time in this lab.**
 %% Cell type:markdown id: tags:
 ### Problem 4.1: Implement the fixed-window model
 %% Cell type:markdown id: tags:
 The architecture of the fixed-window model is presented in Lecture&nbsp;3.2. An input to the network takes the form of a $k$-dimensional vector of word ids and/or tag ids. Each integer $i$ is mapped to an $e_i$-dimensional embedding vector. These vectors are concatenated to form a vector of length $e_1 + \cdots + e_k$, and sent through a feed-forward network with a single hidden layer and a rectified linear unit (ReLU).
 #### Default features
 We ask you to implement a fixed-window model with the following features ($k=4$):
 0. current word
 1. previous word
 2. next word
 3. tag predicted for the previous word
 Whenever the value of a feature is undefined, you should use the special value `PAD`.
 #### Embedding specifications
 To make your implementation of the fixed-window model useful for a range of different applications (including the parser that you will build in lab&nbsp;4), it should support other feature sets than the default model. To this end, the constructor of your model should accept a list of what we call *embedding specifications*. An embedding specification is a triple $(m, n, e)$ consisting of three integers. Such a triple specifies that the model should include $m$ instances of an embedding from $n$ items to vectors of size $e$. All of the $m$ instances are to share their weights. In this lab, the embeddings will be embeddings for words and tags. For example, to instantiate the default feature model, you would initialise the model with the following specifications:
 ``
 [(3, num_words, word_dim), (1, num_tags, tag_dim)]
 ``
 This specifies that the model should use 3 instances of an embedding from *num_words* words to vectors of length *word_dim*, and 1 instance of an embedding from *num_tags* tags to vectors of length *tag_dim*. All 3 instances of the word embedding would share their weights. If you rather wanted to have word embeddings with separate weights, you would initialise the model with the following specifications:
 ``
 [(1, num_words, word_dim), (1, num_words, word_dim), (1, num_words, word_dim), (1, num_tags, tag_dim)]
 ``
 We recommend that you initialize the weights of each embedding with values drawn from $\mathcal{N}(0, 10^{-2})$.
 #### Hyperparameters
 The network architecture introduces a number of hyperparameters. The following choices are reasonable defaults:
 * width of each word embedding: 50
 * width of each tag embedding: 10
 * size of the hidden layer: 100
 %% Cell type:markdown id: tags:
 The next cell contains skeleton code for the implementation of the fixed-window model.
 %% Cell type:code id: tags:
 ``` python
 import torch.nn as nn
 class FixedWindowModel(nn.Module):
    def __init__(self, embedding_specs, hidden_dim, output_dim):
        # TODO: Replace the next line with your own code
        super().__init__()
    def forward(self, features):
        # TODO: Replace the next line with your own code
        return super().forward(features)
 ```
 %% Cell type:markdown id: tags:
 Your implementation should meet the following specification:
 **__init__** (*self*, *embedding_specs*, *hidden_dim*, *output_dim*)
 > A fixed-window model is initialized with a list of specifications for the embeddings the network should use (*embedding_specs*), the size of the hidden layer (*hidden_dim*), and the size of the output layer (*output_dim*).
 **forward** (*self*, *features*)
 > Computes the network output for a given feature representation *features*. This is a tensor of shape $B \times k$ where $B$ is the batch size (number of samples in the batch) and $k$ is the total number of embeddings specified upon initialisation. For example, for the default feature model, $k=4$, as this model includes 3 (weight-sharing) word embeddings and 1 tag embedding.
 #### 💡 Hint on the implementation
 You will have to construct embeddings based on the embedding specifications. It is natural to store these embeddings them in a list- or dictionary-valued attribute of the `FixedWindowModel` object. However, in order to expose the embeddings to the auto-differentiation magic of PyTorch (so that their weights are updated during training), you must instead store them in an [`nn.ModuleList`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleList) or [`nn.ModuleDict`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleDict).
 %% Cell type:markdown id: tags:
 ### Problem 4.2: Implement the tagger
 %% Cell type:markdown id: tags:
 The next step is to implement the tagger itself. The tagger will use the simple algorithm that was presented in Lecture&nbsp;2.3: It processes an input sentence from left to right, and at each position, predicts the tag for the current word based on the features extracted from the current feature window.
 %% Cell type:code id: tags:
 ``` python
 class FixedWindowTagger(Tagger):
    def __init__(self, vocab_words, vocab_tags, output_dim, word_dim=50, tag_dim=10, hidden_dim=100):
        # TODO: Replace the next line with your own code
        raise NotImplementedError
    def featurize(self, words, i, pred_tags):
        # TODO: Replace the next line with your own code
        raise NotImplementedError
    def predict(self, words):
        # TODO: Replace the next line with your own code
        raise NotImplementedError
 ```
 %% Cell type:markdown id: tags:
 Complete the skeleton code by implementing the methods of this interface:
 **__init__** (*self*, *vocab_words*, *vocab_tags*, *word_dim* = 50, *tag_dim* = 10)
 > Creates a new fixed-window model of appropriate dimensions and sets up any other data structures that you consider relevant. The parameters *vocab_words* and *vocab_tags* are the word vocabulary and tag vocabulary. The parameters *word_dim* and *tag_dim* specify the embedding width for the word embeddings and tag embeddings.
 **featurize** (*self*, *words*, *i*, *pred_tags*)
 > Extracts features from the specified tagger configuration according to the default feature model. The configuration is specified in terms of the words in the input sentence (*words*, a list of word ids), the position of the current word (*i*), and the list of already predicted tags (*pred_tags*, a list of tag ids). Returns a tensor that can be fed to the fixed-window model.
 **predict** (*self*, *words*)
 > Processes the input sentence *words* (a list of string tokens) and makes calls to the fixed-window model to predict the tag of each word. Returns the list of the predicted tags (strings).
 %% Cell type:markdown id: tags:
 ### Problem 4.3: Generate the training examples
 %% Cell type:markdown id: tags:
 Your next task is to implement a function that generates the training examples for the tagger. You will train the tagger as usual, using minibatch training.
 %% Cell type:code id: tags:
 ``` python
 def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100):
    return iter()
 ```
 %% Cell type:markdown id: tags:
 Your code should comply with the following specification:
 **training_examples** (*vocab_words*, *vocab_tags*, *gold_data*, *tagger*, *batch_size* = 100)
 > Iterates through the given *gold_data* (an iterable of tagged sentences), encodes it into word ids and tag ids using the specified vocabularies *vocab_words* and *vocab_tags*, and then yields batches of training examples for gradient-based training. Each batch contains *batch_size* examples, except for the last batch, which may contain fewer examples. Each example in the batch is created by a call to the `featurize` function of the *tagger*.
-%% Cell type:code id: tags:
-``` python
-from collections import Counter
-def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100, shuffle=False):
-    bx = []
-    by = []
-    for tagged_sentence in gold_data:
-        # Separate the words and the gold-standard tags
-        words, gold_tags = zip(*tagged_sentence)
-        # Encode words and tags using the vocabularies
-        words = [vocab_words.get(w, UNK_IDX) for w in words]
-        gold_tags = [vocab_tags[t] for t in gold_tags]
-        # Simulate a run of the tagger over the sentence, collecting training examples
-        pred_tags = []
-        for i, gold_tag in enumerate(gold_tags):
-            bx.append(tagger.featurize(words, i, pred_tags))
-            by.append(gold_tag)
-            if len(bx) >= batch_size:
-                bx = torch.stack(bx)
-                by = torch.LongTensor(by)
-                if shuffle:
-                    random_indices = torch.randperm(len(bx))
-                    yield bx[random_indices], by[random_indices]
-                else:
-                    yield bx, by
-                bx = []
-                by = []
-            pred_tags.append(gold_tag)    # teacher forcing!
-    # Check whether there is an incomplete batch
-    if bx:
-        bx = torch.stack(bx)
-        by = torch.LongTensor(by)
-        if shuffle:
-            random_indices = torch.randperm(len(bx))
-            yield bx[random_indices], by[random_indices]
-        else:
-            yield bx, by
-```
 %% Cell type:markdown id: tags:
 ### Problem 4.4: Training loop
 %% Cell type:markdown id: tags:
 What remains to be done is the implementation of the training loop. This should be a straightforward generalization of the training loops that you have seen so far. Complete the skeleton code in the cell below:
 %% Cell type:code id: tags:
 ``` python
 def train_fixed_window(train_data, n_epochs=2, batch_size=100, lr=1e-2):
    # TODO: Replace the next line with your own code
    return Tagger()
 ```
 %% Cell type:markdown id: tags:
 Here is the specification of the training function:
 **train_fixed_window** (*train_data*, *n_epochs* = 1, *batch_size* = 100, *lr* = 1e-2)
 > Trains a fixed-window tagger from a set of training data *train_data* (an iterable over tagged sentences) using minibatch gradient descent and returns it. The parameters *n_epochs* and *batch_size* specify the number of training epochs and the minibatch size, respectively. Training uses the cross-entropy loss function and the [Adam optimizer](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) with learning rate *lr*.
 %% Cell type:markdown id: tags:
 The next code cell trains a tagger and evaluates it on the development data:
 %% Cell type:code id: tags:
 ``` python
 tagger = train_fixed_window(train_data)
 print('{:.4f}'.format(accuracy(tagger, dev_data)))
 ```
 %% Cell type:markdown id: tags:
 **⚠️ Your submitted notebook must contain output demonstrating at least 88% accuracy on the development set.**
 %% Cell type:markdown id: tags:
 ## Problem 5: Pre-trained embeddings (reflection)
 %% Cell type:markdown id: tags:
 Many neural systems for natural language processing use pre-trained word embeddings, either to augment or to replace randomly initialised task-based embeddings. In this problem, you will investigate whether pre-trained embeddings help your part-of-speech tagger.
 The file `glove.pt` contains a PyTorch tensor containing 50-dimensional pre-trained word embeddings from the [GloVe project](https://nlp.stanford.edu/projects/glove/). You can load this tensor using the command
 ```
 glove = torch.load('glove.pt')
 ```
 and should be able to use it as a drop-in replacement for your randomly initialized word embeddings, assuming that the words in your vocabulary are numbered in the order in which they are found in the training data. Have a look at the documentation of the class [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) to learn how to do this.
 Run experiments to assess the effect that pre-trained embeddings have on (a)&nbsp;the accuracy of the tagger, and (b)&nbsp;the speed of learning, i.e., the number of training examples it takes to reach a certain loss. Document your exploration in a short reflection piece (ca. 150&nbsp;words). Respond to the following prompts:
 * How did you integrate the pre-trained embeddings into your system? What did you measure? What results did you get?
 * Based on what you know about word embeddings and transfer learning, did you expect your results? How do you explain them?
 * What did you learn? How, exactly, did you learn it? Why does this learning matter?
 %% Cell type:markdown id: tags:
 *TODO: Insert your report here*
 %% Cell type:markdown id: tags:
 Congratulations on finishing this lab!