In this assignment, you will take the existing from-scratch implementation of the GPT architecture from lab 2 and modify it to implement the BERT architecture with minimal necessary changes. You will validate your implementation by loading pre-trained BERT weights from [Hugging Face](https://huggingface.co) and verifying that it produces the same input-output behaviour as the official BERT model.
## Instructions
1.**Understand the architecture**
- Read Section 3 of [Glavaš and Vulić (2021)](http://dx.doi.org/10.18653/v1/2021.eacl-main.270) to see how they compute relation scores.
- You also need to understand how to compute the loss for the relation prediction task.
2.**Modify your parser to support labelled parsing**
- Extend or adapt your implementation of the bi-affine layer to support the computation of relation scores.
- Extend or adapt your implementation of the loss function.
- Make only the minimal necessary modifications to your existing parser.
3.**Validate your implementation**
- Attempt to replicate the results reported for BERT on the EWT, Table 1 in [Glavaš and Vulić (2021)](http://dx.doi.org/10.18653/v1/2021.eacl-main.270).
- You only need to replicate the results for the standard setup, not for the adapter setup.
4.**Add your work to your portfolio**
- Include a short report summarising the changes you made and the results of your replication attempt.
- Add your notebook and the report to your lab portfolio and present it at the oral exam.
## Hints & considerations
- Computing the arc scores can be seen as a special case of computing the relation scores.
- The overall loss of the parser is the sum of the arc loss and the relation loss.
## Deliverables
-`parser.ipynb` – a notebook containing your parser implementation