@@ -111,7 +111,7 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
**Observation 4**: Ideally, the improvement of throughput would be linear when the number of GPUs increases. In practice, throughtput stays below the ideal curve when the number of gpus increases.