From e1888cab2f90bbe348fe412b353b5d8fb908073f Mon Sep 17 00:00:00 2001 From: Xuan Gu <xuagu37@gmail.com> Date: Tue, 18 Oct 2022 11:21:36 +0200 Subject: [PATCH] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 4c88d3f..24c1af6 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes) - Batch size = 1, 2, 4, 8, 16, 32, 64, 128 +TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. + +AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. + We run 100 iterations for each set of parameters. - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32; when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. -- GitLab