Update README.md

e1888cab · Xuan Gu · GitHub · 59d8fe26 · e1888cab
Unverified Commit e1888cab authored 2 years ago by Xuan Gu Committed by GitHub 2 years ago
--- a/README.md
+++ b/README.md
@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
 - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
 - Batch size = 1, 2, 4, 8, 16, 32, 64, 128  
+TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.   
+AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network.   
 We run 100 iterations for each set of parameters.
 - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
 when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.