Skip to content
Snippets Groups Projects
Commit e5580ada authored by Marco Kuhlmann's avatar Marco Kuhlmann
Browse files

Remove some tests

parent e70ded32
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# L3: Part-of-speech tagging # L3: Part-of-speech tagging
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Part-of-speech tagging is the task of labelling the words (tokens) of a sentence with parts-of-speech such as noun, adjective, and verb. In this lab you will implement the simple, autoregressive fixed-window tagger that was presented in Lecture 3.2. Part-of-speech tagging is the task of labelling the words (tokens) of a sentence with parts-of-speech such as noun, adjective, and verb. In this lab you will implement the simple, autoregressive fixed-window tagger that was presented in Lecture 3.2.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## The data set ## The data set
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The data set for the lab is the English Web Treebank from the [Universal Dependencies Project](http://universaldependencies.org), a corpus containing more than 16,000 sentences (254,000 tokens) annotated with, among other things, parts-of-speech. The Universal Dependencies Project distributes its data in the [CoNLL-U format](https://universaldependencies.org/format.html), but for this lab we have converted the data into a simpler format: words and their part-of-speech tags are separated by tabs, sentences are separated by empty lines. The code in the next cell defines a container class for data with this format. The data set for the lab is the English Web Treebank from the [Universal Dependencies Project](http://universaldependencies.org), a corpus containing more than 16,000 sentences (254,000 tokens) annotated with, among other things, parts-of-speech. The Universal Dependencies Project distributes its data in the [CoNLL-U format](https://universaldependencies.org/format.html), but for this lab we have converted the data into a simpler format: words and their part-of-speech tags are separated by tabs, sentences are separated by empty lines. The code in the next cell defines a container class for data with this format.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class Dataset(): class Dataset():
def __init__(self, filename): def __init__(self, filename):
self.filename = filename self.filename = filename
def __iter__(self): def __iter__(self):
tmp = [] tmp = []
with open(self.filename, 'rt', encoding='utf-8') as lines: with open(self.filename, 'rt', encoding='utf-8') as lines:
for line in lines: for line in lines:
line = line.rstrip() line = line.rstrip()
if line: if line:
tmp.append(tuple(line.split('\t'))) tmp.append(tuple(line.split('\t')))
else: else:
yield tmp yield tmp
tmp = [] tmp = []
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We load the training data and the development data for this lab: We load the training data and the development data for this lab:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
train_data = Dataset('train.txt') train_data = Dataset('train.txt')
dev_data = Dataset('dev.txt') dev_data = Dataset('dev.txt')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Both data sets consist of **tagged sentences**. On the Python side of things, a tagged sentence is represented as a list of string pairs, where the first component of each pair represents a word token and the second component represents the word’s tag. The possible tags are listed and exemplified in the [Annotation Guidelines](http://universaldependencies.org/u/pos/all.html) of the Universal Dependencies Project. Run the next code cell to see an example of a tagged sentence. Both data sets consist of **tagged sentences**. On the Python side of things, a tagged sentence is represented as a list of string pairs, where the first component of each pair represents a word token and the second component represents the word’s tag. The possible tags are listed and exemplified in the [Annotation Guidelines](http://universaldependencies.org/u/pos/all.html) of the Universal Dependencies Project. Run the next code cell to see an example of a tagged sentence.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
list(train_data)[42] list(train_data)[42]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Tagger interface ## Tagger interface
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The tagger that you will implement in this lab follows a simple interface with just one method: The tagger that you will implement in this lab follows a simple interface with just one method:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class Tagger(object): class Tagger(object):
def predict(self, sentence): def predict(self, sentence):
raise NotImplementedError raise NotImplementedError
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The single method of this interface has the following specification: The single method of this interface has the following specification:
**predict** (*self*, *sentence*) **predict** (*self*, *sentence*)
> Returns the list of predicted tags (a list of strings) for a single *sentence* (a list of string tokens). > Returns the list of predicted tags (a list of strings) for a single *sentence* (a list of string tokens).
One trivial implementation of this interface is a tagger that always predicts the same tag for every word, independently of the input: One trivial implementation of this interface is a tagger that always predicts the same tag for every word, independently of the input:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class ConstantTagger(Tagger): class ConstantTagger(Tagger):
def __init__(self, the_tag): def __init__(self, the_tag):
self.the_tag = the_tag self.the_tag = the_tag
def predict(self, words): def predict(self, words):
return [self.the_tag] * len(words) return [self.the_tag] * len(words)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Problem 1: Implement an evaluation function ## Problem 1: Implement an evaluation function
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your first task is to implement a function that computes the accuracy of a tagger on gold-standard data. Your first task is to implement a function that computes the accuracy of a tagger on gold-standard data.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def accuracy(tagger, gold_data): def accuracy(tagger, gold_data):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
return 0.0 return 0.0
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your implementation should conform to the following specification: Your implementation should conform to the following specification:
**accuracy** (*tagger*, *gold_data*) **accuracy** (*tagger*, *gold_data*)
> Computes the accuracy of the *tagger* on the gold-standard data *gold_data* (an iterable of tagged sentences) and returns it as a float. Recall that the accuracy is defined as the percentage of tokens to which the tagger assigns the correct tag (as per the gold standard). > Computes the accuracy of the *tagger* on the gold-standard data *gold_data* (an iterable of tagged sentences) and returns it as a float. Recall that the accuracy is defined as the percentage of tokens to which the tagger assigns the correct tag (as per the gold standard).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 🤞 Test your code ### 🤞 Test your code
Test your code by computing the accuracy on the development set of a trivial tagger that tags each word as a noun. The expected value is 16.69%. Test your code by computing the accuracy on the development set of a trivial tagger that tags each word as a noun. The expected value is 16.69%.
%% Cell type:code id: tags:
``` python
def test1():
tagger1 = ConstantTagger('NOUN') # will tag each word as a noun
print('{:.4f}'.format(accuracy(tagger1, dev_data)))
test1()
```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Problem 2: Implement a baseline ## Problem 2: Implement a baseline
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Before you start working on the tagger as such, we ask you to first implement a simple baseline: Before you start working on the tagger as such, we ask you to first implement a simple baseline:
> Tag each input word with the most frequent tag for that word in the training data. If an input word does not occur in the training data, tag it with the overall most frequent tag in the training data. Break ties by choosing that tag which comes first in the alphabetical order. > Tag each input word with the most frequent tag for that word in the training data. If an input word does not occur in the training data, tag it with the overall most frequent tag in the training data. Break ties by choosing that tag which comes first in the alphabetical order.
To implement the baseline, you need to implement both a class `BaselineTagger` and a function `train_baseline`. A `BaselineTagger` has two fields: a dictionary mapping each word in the training data to the most frequent tag for that word, and a string representing the fallback tag (overall most frequent tag in the training data). Both of these fields are set in the `train_baseline` function. To implement the baseline, you need to implement both a class `BaselineTagger` and a function `train_baseline`. A `BaselineTagger` has two fields: a dictionary mapping each word in the training data to the most frequent tag for that word, and a string representing the fallback tag (overall most frequent tag in the training data). Both of these fields are set in the `train_baseline` function.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class BaselineTagger(Tagger): class BaselineTagger(Tagger):
def __init__(self): def __init__(self):
self.most_frequent = {} self.most_frequent = {}
self.fallback = None self.fallback = None
def predict(self, words): def predict(self, words):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
super().predict(words) super().predict(words)
def train_baseline(train_data): def train_baseline(train_data):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
return BaselineTagger() return BaselineTagger()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 🤞 Test your code ### 🤞 Test your code
Test your implementation by computing the accuracy of the baseline tagger on the development data. The expected value is 85.61%. Test your implementation by computing the accuracy of the baseline tagger on the development data. The expected value is 85.61%.
%% Cell type:code id: tags:
``` python
def test2():
tagger2 = train_baseline(train_data)
print('{:.4f}'.format(accuracy(tagger2, dev_data)))
test2()
```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Problem 3: Create the vocabularies ## Problem 3: Create the vocabularies
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
As in previous labs, you will need an explicit representation of your vocabulary. Here we actually have two vocabularies: one for the words and one for the tags. Both should be represented as dictionaries that map words/tags to a contiguous range of integers, starting at zero. As in previous labs, you will need an explicit representation of your vocabulary. Here we actually have two vocabularies: one for the words and one for the tags. Both should be represented as dictionaries that map words/tags to a contiguous range of integers, starting at zero.
The next cell contains skeleton code for a function `make_vocabs` that constructs the two vocabularies from gold-standard data. The code cell also defines a name for the ‘unknown word’ (`UNK`) and for an additional pseudoword that you will use as a placeholder for undefined values (`PAD`). The next cell contains skeleton code for a function `make_vocabs` that constructs the two vocabularies from gold-standard data. The code cell also defines a name for the ‘unknown word’ (`UNK`) and for an additional pseudoword that you will use as a placeholder for undefined values (`PAD`).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
PAD = '<pad>' PAD = '<pad>'
UNK = '<unk>' UNK = '<unk>'
def make_vocabs(gold_data): def make_vocabs(gold_data):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
return {}, {} return {}, {}
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Complete the code according to the following specification: Complete the code according to the following specification:
**make_vocabs** (*gold_data*) **make_vocabs** (*gold_data*)
> Returns a pair of dictionaries mapping the unique words and tags in the gold-standard data *gold_data* (an iterable over tagged sentences) to contiguous ranges of integers starting at zero. The word dictionary contains the pseudowords `PAD` (index&nbsp;0) and `UNK` (index&nbsp;1); the tag dictionary contains `PAD` (index&nbsp;0). > Returns a pair of dictionaries mapping the unique words and tags in the gold-standard data *gold_data* (an iterable over tagged sentences) to contiguous ranges of integers starting at zero. The word dictionary contains the pseudowords `PAD` (index&nbsp;0) and `UNK` (index&nbsp;1); the tag dictionary contains `PAD` (index&nbsp;0).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 🤞 Test your code ### 🤞 Test your code
Test your implementation by computing the total number of unique words and tags in the training data (including the pseudowords). The expected values are 19,674&nbsp;words and 18&nbsp;tags. Test your implementation by computing the total number of unique words and tags in the training data (including the pseudowords). The expected values are 19,674&nbsp;words and 18&nbsp;tags.
%% Cell type:code id: tags:
``` python
def test3():
vocab_words, vocab_tags = make_vocabs(train_data)
print('words: {}, tags: {}'.format(len(vocab_words), len(vocab_tags)))
test3()
```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Problem 4: Fixed-window tagger ## Problem 4: Fixed-window tagger
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your main task in this lab is to implement a complete, autoregressive part-of-speech tagger based on the fixed-window architecture. This implementation has four parts: the fixed-window model; a tagger that uses the fixed-window model to make predictions; a function that generates training examples for the tagger; and the training function. Your main task in this lab is to implement a complete, autoregressive part-of-speech tagger based on the fixed-window architecture. This implementation has four parts: the fixed-window model; a tagger that uses the fixed-window model to make predictions; a function that generates training examples for the tagger; and the training function.
**⚠️ We expect that solving this problem will take you the longest time in this lab.** **⚠️ We expect that solving this problem will take you the longest time in this lab.**
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Problem 4.1: Implement the fixed-window model ### Problem 4.1: Implement the fixed-window model
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The architecture of the fixed-window model is presented in Lecture&nbsp;3.2. An input to the network takes the form of a $k$-dimensional vector of word ids and/or tag ids. Each integer $i$ is mapped to an $e_i$-dimensional embedding vector. These vectors are concatenated to form a vector of length $e_1 + \cdots + e_k$, and sent through a feed-forward network with a single hidden layer and a rectified linear unit (ReLU). The architecture of the fixed-window model is presented in Lecture&nbsp;3.2. An input to the network takes the form of a $k$-dimensional vector of word ids and/or tag ids. Each integer $i$ is mapped to an $e_i$-dimensional embedding vector. These vectors are concatenated to form a vector of length $e_1 + \cdots + e_k$, and sent through a feed-forward network with a single hidden layer and a rectified linear unit (ReLU).
#### Default features #### Default features
We ask you to implement a fixed-window model with the following features ($k=4$): We ask you to implement a fixed-window model with the following features ($k=4$):
0. current word 0. current word
1. previous word 1. previous word
2. next word 2. next word
3. tag predicted for the previous word 3. tag predicted for the previous word
Whenever the value of a feature is undefined, you should use the special value `PAD`. Whenever the value of a feature is undefined, you should use the special value `PAD`.
#### Embedding specifications #### Embedding specifications
To make your implementation of the fixed-window model useful for a range of different applications (including the parser that you will build in lab&nbsp;4), it should support other feature sets than the default model. To this end, the constructor of your model should accept a list of what we call *embedding specifications*. An embedding specification is a triple $(m, n, e)$ consisting of three integers. Such a triple specifies that the model should include $m$ instances of an embedding from $n$ items to vectors of size $e$. All of the $m$ instances are to share their weights. In this lab, the embeddings will be embeddings for words and tags. For example, to instantiate the default feature model, you would initialise the model with the following specifications: To make your implementation of the fixed-window model useful for a range of different applications (including the parser that you will build in lab&nbsp;4), it should support other feature sets than the default model. To this end, the constructor of your model should accept a list of what we call *embedding specifications*. An embedding specification is a triple $(m, n, e)$ consisting of three integers. Such a triple specifies that the model should include $m$ instances of an embedding from $n$ items to vectors of size $e$. All of the $m$ instances are to share their weights. In this lab, the embeddings will be embeddings for words and tags. For example, to instantiate the default feature model, you would initialise the model with the following specifications:
`` ``
[(3, num_words, word_dim), (1, num_tags, tag_dim)] [(3, num_words, word_dim), (1, num_tags, tag_dim)]
`` ``
This specifies that the model should use 3 instances of an embedding from *num_words* words to vectors of length *word_dim*, and 1 instance of an embedding from *num_tags* tags to vectors of length *tag_dim*. All 3 instances of the word embedding would share their weights. If you rather wanted to have word embeddings with separate weights, you would initialise the model with the following specifications: This specifies that the model should use 3 instances of an embedding from *num_words* words to vectors of length *word_dim*, and 1 instance of an embedding from *num_tags* tags to vectors of length *tag_dim*. All 3 instances of the word embedding would share their weights. If you rather wanted to have word embeddings with separate weights, you would initialise the model with the following specifications:
`` ``
[(1, num_words, word_dim), (1, num_words, word_dim), (1, num_words, word_dim), (1, num_tags, tag_dim)] [(1, num_words, word_dim), (1, num_words, word_dim), (1, num_words, word_dim), (1, num_tags, tag_dim)]
`` ``
We recommend that you initialize the weights of each embedding with values drawn from $\mathcal{N}(0, 10^{-2})$. We recommend that you initialize the weights of each embedding with values drawn from $\mathcal{N}(0, 10^{-2})$.
#### Hyperparameters #### Hyperparameters
The network architecture introduces a number of hyperparameters. The following choices are reasonable defaults: The network architecture introduces a number of hyperparameters. The following choices are reasonable defaults:
* width of each word embedding: 50 * width of each word embedding: 50
* width of each tag embedding: 10 * width of each tag embedding: 10
* size of the hidden layer: 100 * size of the hidden layer: 100
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The next cell contains skeleton code for the implementation of the fixed-window model. The next cell contains skeleton code for the implementation of the fixed-window model.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import torch.nn as nn import torch.nn as nn
class FixedWindowModel(nn.Module): class FixedWindowModel(nn.Module):
def __init__(self, embedding_specs, hidden_dim, output_dim): def __init__(self, embedding_specs, hidden_dim, output_dim):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
super().__init__() super().__init__()
def forward(self, features): def forward(self, features):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
return super().forward(features) return super().forward(features)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your implementation should meet the following specification: Your implementation should meet the following specification:
**__init__** (*self*, *embedding_specs*, *hidden_dim*, *output_dim*) **__init__** (*self*, *embedding_specs*, *hidden_dim*, *output_dim*)
> A fixed-window model is initialized with a list of specifications for the embeddings the network should use (*embedding_specs*), the size of the hidden layer (*hidden_dim*), and the size of the output layer (*output_dim*). > A fixed-window model is initialized with a list of specifications for the embeddings the network should use (*embedding_specs*), the size of the hidden layer (*hidden_dim*), and the size of the output layer (*output_dim*).
**forward** (*self*, *features*) **forward** (*self*, *features*)
> Computes the network output for a given feature representation *features*. This is a tensor of shape $B \times k$ where $B$ is the batch size (number of samples in the batch) and $k$ is the total number of embeddings specified upon initialisation. For example, for the default feature model, $k=4$, as this model includes 3 (weight-sharing) word embeddings and 1 tag embedding. > Computes the network output for a given feature representation *features*. This is a tensor of shape $B \times k$ where $B$ is the batch size (number of samples in the batch) and $k$ is the total number of embeddings specified upon initialisation. For example, for the default feature model, $k=4$, as this model includes 3 (weight-sharing) word embeddings and 1 tag embedding.
#### 💡 Hint on the implementation #### 💡 Hint on the implementation
You will have to construct embeddings based on the embedding specifications. It is natural to store these embeddings them in a list- or dictionary-valued attribute of the `FixedWindowModel` object. However, in order to expose the embeddings to the auto-differentiation magic of PyTorch (so that their weights are updated during training), you must instead store them in an [`nn.ModuleList`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleList) or [`nn.ModuleDict`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleDict). You will have to construct embeddings based on the embedding specifications. It is natural to store these embeddings them in a list- or dictionary-valued attribute of the `FixedWindowModel` object. However, in order to expose the embeddings to the auto-differentiation magic of PyTorch (so that their weights are updated during training), you must instead store them in an [`nn.ModuleList`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleList) or [`nn.ModuleDict`](https://pytorch.org/docs/stable/nn.html#torch.nn.ModuleDict).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Problem 4.2: Implement the tagger ### Problem 4.2: Implement the tagger
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The next step is to implement the tagger itself. The tagger will use the simple algorithm that was presented in Lecture&nbsp;2.3: It processes an input sentence from left to right, and at each position, predicts the tag for the current word based on the features extracted from the current feature window. The next step is to implement the tagger itself. The tagger will use the simple algorithm that was presented in Lecture&nbsp;2.3: It processes an input sentence from left to right, and at each position, predicts the tag for the current word based on the features extracted from the current feature window.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class FixedWindowTagger(Tagger): class FixedWindowTagger(Tagger):
def __init__(self, vocab_words, vocab_tags, output_dim, word_dim=50, tag_dim=10, hidden_dim=100): def __init__(self, vocab_words, vocab_tags, output_dim, word_dim=50, tag_dim=10, hidden_dim=100):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
raise NotImplementedError raise NotImplementedError
def featurize(self, words, i, pred_tags): def featurize(self, words, i, pred_tags):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
raise NotImplementedError raise NotImplementedError
def predict(self, words): def predict(self, words):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
raise NotImplementedError raise NotImplementedError
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Complete the skeleton code by implementing the methods of this interface: Complete the skeleton code by implementing the methods of this interface:
**__init__** (*self*, *vocab_words*, *vocab_tags*, *word_dim* = 50, *tag_dim* = 10) **__init__** (*self*, *vocab_words*, *vocab_tags*, *word_dim* = 50, *tag_dim* = 10)
> Creates a new fixed-window model of appropriate dimensions and sets up any other data structures that you consider relevant. The parameters *vocab_words* and *vocab_tags* are the word vocabulary and tag vocabulary. The parameters *word_dim* and *tag_dim* specify the embedding width for the word embeddings and tag embeddings. > Creates a new fixed-window model of appropriate dimensions and sets up any other data structures that you consider relevant. The parameters *vocab_words* and *vocab_tags* are the word vocabulary and tag vocabulary. The parameters *word_dim* and *tag_dim* specify the embedding width for the word embeddings and tag embeddings.
**featurize** (*self*, *words*, *i*, *pred_tags*) **featurize** (*self*, *words*, *i*, *pred_tags*)
> Extracts features from the specified tagger configuration according to the default feature model. The configuration is specified in terms of the words in the input sentence (*words*, a list of word ids), the position of the current word (*i*), and the list of already predicted tags (*pred_tags*, a list of tag ids). Returns a tensor that can be fed to the fixed-window model. > Extracts features from the specified tagger configuration according to the default feature model. The configuration is specified in terms of the words in the input sentence (*words*, a list of word ids), the position of the current word (*i*), and the list of already predicted tags (*pred_tags*, a list of tag ids). Returns a tensor that can be fed to the fixed-window model.
**predict** (*self*, *words*) **predict** (*self*, *words*)
> Processes the input sentence *words* (a list of string tokens) and makes calls to the fixed-window model to predict the tag of each word. Returns the list of the predicted tags (strings). > Processes the input sentence *words* (a list of string tokens) and makes calls to the fixed-window model to predict the tag of each word. Returns the list of the predicted tags (strings).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Problem 4.3: Generate the training examples ### Problem 4.3: Generate the training examples
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your next task is to implement a function that generates the training examples for the tagger. You will train the tagger as usual, using minibatch training. Your next task is to implement a function that generates the training examples for the tagger. You will train the tagger as usual, using minibatch training.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100): def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100):
return iter() return iter()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Your code should comply with the following specification: Your code should comply with the following specification:
**training_examples** (*vocab_words*, *vocab_tags*, *gold_data*, *tagger*, *batch_size* = 100) **training_examples** (*vocab_words*, *vocab_tags*, *gold_data*, *tagger*, *batch_size* = 100)
> Iterates through the given *gold_data* (an iterable of tagged sentences), encodes it into word ids and tag ids using the specified vocabularies *vocab_words* and *vocab_tags*, and then yields batches of training examples for gradient-based training. Each batch contains *batch_size* examples, except for the last batch, which may contain fewer examples. Each example in the batch is created by a call to the `featurize` function of the *tagger*. > Iterates through the given *gold_data* (an iterable of tagged sentences), encodes it into word ids and tag ids using the specified vocabularies *vocab_words* and *vocab_tags*, and then yields batches of training examples for gradient-based training. Each batch contains *batch_size* examples, except for the last batch, which may contain fewer examples. Each example in the batch is created by a call to the `featurize` function of the *tagger*.
%% Cell type:code id: tags:
``` python
from collections import Counter
def training_examples(vocab_words, vocab_tags, gold_data, tagger, batch_size=100, shuffle=False):
bx = []
by = []
for tagged_sentence in gold_data:
# Separate the words and the gold-standard tags
words, gold_tags = zip(*tagged_sentence)
# Encode words and tags using the vocabularies
words = [vocab_words.get(w, UNK_IDX) for w in words]
gold_tags = [vocab_tags[t] for t in gold_tags]
# Simulate a run of the tagger over the sentence, collecting training examples
pred_tags = []
for i, gold_tag in enumerate(gold_tags):
bx.append(tagger.featurize(words, i, pred_tags))
by.append(gold_tag)
if len(bx) >= batch_size:
bx = torch.stack(bx)
by = torch.LongTensor(by)
if shuffle:
random_indices = torch.randperm(len(bx))
yield bx[random_indices], by[random_indices]
else:
yield bx, by
bx = []
by = []
pred_tags.append(gold_tag) # teacher forcing!
# Check whether there is an incomplete batch
if bx:
bx = torch.stack(bx)
by = torch.LongTensor(by)
if shuffle:
random_indices = torch.randperm(len(bx))
yield bx[random_indices], by[random_indices]
else:
yield bx, by
```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Problem 4.4: Training loop ### Problem 4.4: Training loop
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
What remains to be done is the implementation of the training loop. This should be a straightforward generalization of the training loops that you have seen so far. Complete the skeleton code in the cell below: What remains to be done is the implementation of the training loop. This should be a straightforward generalization of the training loops that you have seen so far. Complete the skeleton code in the cell below:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def train_fixed_window(train_data, n_epochs=2, batch_size=100, lr=1e-2): def train_fixed_window(train_data, n_epochs=2, batch_size=100, lr=1e-2):
# TODO: Replace the next line with your own code # TODO: Replace the next line with your own code
return Tagger() return Tagger()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here is the specification of the training function: Here is the specification of the training function:
**train_fixed_window** (*train_data*, *n_epochs* = 1, *batch_size* = 100, *lr* = 1e-2) **train_fixed_window** (*train_data*, *n_epochs* = 1, *batch_size* = 100, *lr* = 1e-2)
> Trains a fixed-window tagger from a set of training data *train_data* (an iterable over tagged sentences) using minibatch gradient descent and returns it. The parameters *n_epochs* and *batch_size* specify the number of training epochs and the minibatch size, respectively. Training uses the cross-entropy loss function and the [Adam optimizer](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) with learning rate *lr*. > Trains a fixed-window tagger from a set of training data *train_data* (an iterable over tagged sentences) using minibatch gradient descent and returns it. The parameters *n_epochs* and *batch_size* specify the number of training epochs and the minibatch size, respectively. Training uses the cross-entropy loss function and the [Adam optimizer](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) with learning rate *lr*.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The next code cell trains a tagger and evaluates it on the development data: The next code cell trains a tagger and evaluates it on the development data:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tagger = train_fixed_window(train_data) tagger = train_fixed_window(train_data)
print('{:.4f}'.format(accuracy(tagger, dev_data))) print('{:.4f}'.format(accuracy(tagger, dev_data)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
**⚠️ Your submitted notebook must contain output demonstrating at least 88% accuracy on the development set.** **⚠️ Your submitted notebook must contain output demonstrating at least 88% accuracy on the development set.**
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Problem 5: Pre-trained embeddings (reflection) ## Problem 5: Pre-trained embeddings (reflection)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Many neural systems for natural language processing use pre-trained word embeddings, either to augment or to replace randomly initialised task-based embeddings. In this problem, you will investigate whether pre-trained embeddings help your part-of-speech tagger. Many neural systems for natural language processing use pre-trained word embeddings, either to augment or to replace randomly initialised task-based embeddings. In this problem, you will investigate whether pre-trained embeddings help your part-of-speech tagger.
The file `glove.pt` contains a PyTorch tensor containing 50-dimensional pre-trained word embeddings from the [GloVe project](https://nlp.stanford.edu/projects/glove/). You can load this tensor using the command The file `glove.pt` contains a PyTorch tensor containing 50-dimensional pre-trained word embeddings from the [GloVe project](https://nlp.stanford.edu/projects/glove/). You can load this tensor using the command
``` ```
glove = torch.load('glove.pt') glove = torch.load('glove.pt')
``` ```
and should be able to use it as a drop-in replacement for your randomly initialized word embeddings, assuming that the words in your vocabulary are numbered in the order in which they are found in the training data. Have a look at the documentation of the class [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) to learn how to do this. and should be able to use it as a drop-in replacement for your randomly initialized word embeddings, assuming that the words in your vocabulary are numbered in the order in which they are found in the training data. Have a look at the documentation of the class [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) to learn how to do this.
Run experiments to assess the effect that pre-trained embeddings have on (a)&nbsp;the accuracy of the tagger, and (b)&nbsp;the speed of learning, i.e., the number of training examples it takes to reach a certain loss. Document your exploration in a short reflection piece (ca. 150&nbsp;words). Respond to the following prompts: Run experiments to assess the effect that pre-trained embeddings have on (a)&nbsp;the accuracy of the tagger, and (b)&nbsp;the speed of learning, i.e., the number of training examples it takes to reach a certain loss. Document your exploration in a short reflection piece (ca. 150&nbsp;words). Respond to the following prompts:
* How did you integrate the pre-trained embeddings into your system? What did you measure? What results did you get? * How did you integrate the pre-trained embeddings into your system? What did you measure? What results did you get?
* Based on what you know about word embeddings and transfer learning, did you expect your results? How do you explain them? * Based on what you know about word embeddings and transfer learning, did you expect your results? How do you explain them?
* What did you learn? How, exactly, did you learn it? Why does this learning matter? * What did you learn? How, exactly, did you learn it? Why does this learning matter?
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
*TODO: Insert your report here* *TODO: Insert your report here*
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Congratulations on finishing this lab! Congratulations on finishing this lab!
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment