diff --git a/README.md b/README.md index 24c1af697bea1950163955c7d2365bf237c3f7c7..8c0f8e11b95e00e50c83164846a39dbe5f0532ce 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,7 @@ We collect benchmark results of throughput (images/sec) for TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. -AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. +AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision (FP16) format, while storing minimal information in single-precision (TF32) to retain as much information as possible in critical parts of the network. We run 100 iterations for each set of parameters. - Observation 1: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;