diff --git a/README.md b/README.md
index 4c88d3fca32ca70a38a330f86b1f682b70cfc9bd..24c1af697bea1950163955c7d2365bf237c3f7c7 100644
--- a/README.md
+++ b/README.md
@@ -64,6 +64,10 @@ We collect benchmark results of throughput (images/sec) for
 - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
 - Batch size = 1, 2, 4, 8, 16, 32, 64, 128  
 
+TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.   
+
+AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network.   
+
 We run 100 iterations for each set of parameters.
 - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
 when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.