"To participate in the [Chocolate Box Challenge](https://www.kaggle.com/t/abb4bfa8d2664ea0b72722fa0ae042f5), run the next code cell to produce a file `submission.csv` and upload this file to Kaggle."
"To participate in the [Chocolate Box Challenge](https://www.kaggle.com/t/c46f4697d9af4d57af1b0db9fd5ebd67), run the next code cell to produce a file `submission.csv` and upload this file to Kaggle."
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# L4X: Feature engineering for part-of-speech tagging
# L4X: Feature engineering for part-of-speech tagging
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
In this lab, you will practice your skills in feature engineering, the task of identifying useful features for a machine learning system.
In this lab, you will practice your skills in feature engineering, the task of identifying useful features for a machine learning system.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## The data set
## The data set
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
The data for this lab and their representation is the same as for the basic lab.
The data for this lab and their representation is the same as for the basic lab.
We load the training data and the development data for this lab:
We load the training data and the development data for this lab:
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
train_data=Dataset('train.txt')
train_data=Dataset('train.txt')
dev_data=Dataset('dev.txt')
dev_data=Dataset('dev.txt')
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Baseline tagger
## Baseline tagger
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
The baseline tagger that you will use in this lab is a pure Python implementation of the perceptron tagger that was presented in Lecture 4.3 and Lecture 4.4. To understand what the code provided here does, and how it might be extended with new features, you should watch these two lectures.
The baseline tagger that you will use in this lab is a pure Python implementation of the perceptron tagger that was presented in Lecture 4.3 and Lecture 4.4. To understand what the code provided here does, and how it might be extended with new features, you should watch these two lectures.
Your first task is to implement a function that computes the accuracy of the tagger on gold-standard data. You have already implemented this function for the base lab, so you should be able to just copy-and-paste it here.
Your first task is to implement a function that computes the accuracy of the tagger on gold-standard data. You have already implemented this function for the base lab, so you should be able to just copy-and-paste it here.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defaccuracy(tagger,gold_data):
defaccuracy(tagger,gold_data):
# TODO: Replace the next line with your own code
# TODO: Replace the next line with your own code
return0.0
return0.0
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Problem 2: Feature engineering
## Problem 2: Feature engineering
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Your main task now is to try to improve the performance of the perceptron tagger by adding new features. The only part of the code that you are allowed to change is the `featurize` method. Provide a short (ca. 150 words) report on what features you added and what results you obtained.
Your main task now is to try to improve the performance of the perceptron tagger by adding new features. The only part of the code that you are allowed to change is the `featurize` method. Provide a short (ca. 150 words) report on what features you added and what results you obtained.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
**⚠️ Your submitted notebook must contain output demonstrating at least 91% accuracy on the development set.**
**⚠️ Your submitted notebook must contain output demonstrating at least 91% accuracy on the development set.**
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
tagger=train_perceptron(train_data,n_epochs=3)
tagger=train_perceptron(train_data,n_epochs=3)
print('{:.4f}'.format(accuracy(tagger,dev_data)))
print('{:.4f}'.format(accuracy(tagger,dev_data)))
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
*TODO: Insert your report here*
*TODO: Insert your report here*
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Chocolate Box Challenge
## Chocolate Box Challenge
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
To participate in the [Chocolate Box Challenge](https://www.kaggle.com/t/abb4bfa8d2664ea0b72722fa0ae042f5), run the next code cell to produce a file `submission.csv` and upload this file to Kaggle.
To participate in the [Chocolate Box Challenge](https://www.kaggle.com/t/c46f4697d9af4d57af1b0db9fd5ebd67), run the next code cell to produce a file `submission.csv` and upload this file to Kaggle.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# Load the test data (without the tags)
# Load the test data (without the tags)
test_data=Dataset('test-notags.txt')
test_data=Dataset('test-notags.txt')
# Generate submission.csv with results on both the dev data and the test data
# Generate submission.csv with results on both the dev data and the test data