Skip to content
Snippets Groups Projects

Add TODO cells for Test your code sections in L1

Merged Riley Capshaw requested to merge l1-test-your-code-todo into master
1 file
+ 30
0
Compare changes
  • Side-by-side
  • Inline
+ 30
0
To test your code, print the sizes of the vocabularies constructed from the two datasets, as well as the count totals. The correct vocabulary size for the minimal dataset is 3,231; for the full dataset, the correct vocabulary size is 73,339. The correct totals are 155,818 for the minimal dataset and 17,297,355 for the full dataset.
To test your code, print the sizes of the vocabularies constructed from the two datasets, as well as the count totals. The correct vocabulary size for the minimal dataset is 3,231; for the full dataset, the correct vocabulary size is 73,339. The correct totals are 155,818 for the minimal dataset and 17,297,355 for the full dataset.
 
%% Cell type:code id: tags:
 
``` python
 
# TODO: Test your code here.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Test your code by comparing the total number of tokens in the preprocessed version of each dataset with the corresponding number for the original data. The former should be ca. 59% of the latter for the minimal dataset, and ca. 69% for the full dataset. The exact percentage will vary slightly because of the randomness in the sampling. You may want to repeat your computation several times.
Test your code by comparing the total number of tokens in the preprocessed version of each dataset with the corresponding number for the original data. The former should be ca. 59% of the latter for the minimal dataset, and ca. 69% for the full dataset. The exact percentage will vary slightly because of the randomness in the sampling. You may want to repeat your computation several times.
 
%% Cell type:code id: tags:
 
``` python
 
# TODO: Test your code here.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
To test your code, compare the total number of positive samples (across all batches) to the total number of tokens in the (un-preprocessed) minimal dataset. The ratio between these two values should be ca. 2.64. If you can spare the time, you can make the same comparison on the full dataset; here, the expected ratio is 3.25. As before, the numbers may vary slightly because of randomness, so you may want to run the comparison more than once.
To test your code, compare the total number of positive samples (across all batches) to the total number of tokens in the (un-preprocessed) minimal dataset. The ratio between these two values should be ca. 2.64. If you can spare the time, you can make the same comparison on the full dataset; here, the expected ratio is 3.25. As before, the numbers may vary slightly because of randomness, so you may want to run the comparison more than once.
 
%% Cell type:code id: tags:
 
``` python
 
# TODO: Test your code here.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Test your code by creating an instance of the model, and check that `forward` returns the expected result on random input tensors *w* and *c*. To help you, the following function will return a random example from the first 100 examples produced by `training_examples`.
Test your code by creating an instance of the model, and check that `forward` returns the expected result on random input tensors *w* and *c*. To help you, the following function will return a random example from the first 100 examples produced by `training_examples`.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
 
# TODO: Test your code here.
 
%% Cell type:code id: tags:
Training on the full dataset will take some time – on a CPU, you should expect 10–40 minutes per epoch, depending on hardware. To give you some guidance: The total number of positive examples is approximately 58M, and the batch size is chosen so that each batch contains roughly 10% of these examples. To speed things up, you can train using a GPU; our reference implementation runs in less than 2 minutes per epoch on [Colab](http://colab.research.google.com).
Training on the full dataset will take some time – on a CPU, you should expect 10–40 minutes per epoch, depending on hardware. To give you some guidance: The total number of positive examples is approximately 58M, and the batch size is chosen so that each batch contains roughly 10% of these examples. To speed things up, you can train using a GPU; our reference implementation runs in less than 2 minutes per epoch on [Colab](http://colab.research.google.com).
 
%% Cell type:code id: tags:
 
``` python
 
# TODO: Train your model on the full dataset here.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Loading