@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
...
@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
- Batch size = 1, 2, 4, 8, 16, 32, 64, 128
- Batch size = 1, 2, 4, 8, 16, 32, 64, 128
TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.
AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network.
We run 100 iterations for each set of parameters.
We run 100 iterations for each set of parameters.
- Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;
- Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;
when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.