@@ -60,7 +60,8 @@ We collect benchmark results of throughput (images/sec) for
...
@@ -60,7 +60,8 @@ We collect benchmark results of throughput (images/sec) for
- Dimention = 2
- Dimention = 2
- Nodes = 1, 2
- Nodes = 1, 2
- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
- Batch size = 2, 4, 8, 16, 32, 64, 128
- Batch size = 2, 4, 8, 16, 32, 64, 128
We run 100 iterations for each set of parameters.
We run 100 iterations for each set of parameters.
- Observation 1: throughput_tf32 > throughput_amp when batch_size is small (1, 2, 4, 8); throughput_tf32 < throughput_amp when batch_size is large (16, 32, 64, 128).
- Observation 1: throughput_tf32 > throughput_amp when batch_size is small (1, 2, 4, 8); throughput_tf32 < throughput_amp when batch_size is large (16, 32, 64, 128).
- Observation 2: The coefficient of variation for the 100 iteration is smallest when batch_size = 128.
- Observation 2: The coefficient of variation for the 100 iteration is smallest when batch_size = 128.