Skip to content
Snippets Groups Projects
Unverified Commit e1888cab authored by Xuan Gu's avatar Xuan Gu Committed by GitHub
Browse files

Update README.md

parent 59d8fe26
No related branches found
No related tags found
No related merge requests found
...@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for ...@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes) - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
- Batch size = 1, 2, 4, 8, 16, 32, 64, 128 - Batch size = 1, 2, 4, 8, 16, 32, 64, 128
TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.
AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network.
We run 100 iterations for each set of parameters. We run 100 iterations for each set of parameters.
- Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32; - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;
when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment