From e1888cab2f90bbe348fe412b353b5d8fb908073f Mon Sep 17 00:00:00 2001
From: Xuan Gu <xuagu37@gmail.com>
Date: Tue, 18 Oct 2022 11:21:36 +0200
Subject: [PATCH] Update README.md

---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index 4c88d3f..24c1af6 100644
--- a/README.md
+++ b/README.md
@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
 - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
 - Batch size = 1, 2, 4, 8, 16, 32, 64, 128  
 
+TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.   
+
+AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network.   
+
 We run 100 iterations for each set of parameters.
 - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
 when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.  
-- 
GitLab