diff --git a/README.md b/README.md index b82afce5544ec546d4b0cf339259237daf50d162..cec5b563e56534610f8e22ff101ab840a2ce627b 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,8 @@ We collect benchmark results of throughput (images/sec) for - Dimention = 2 - Nodes = 1, 2 - GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes) -- Batch size = 2, 4, 8, 16, 32, 64, 128 -We run 100 iterations for each set of parameters. +- Batch size = 2, 4, 8, 16, 32, 64, 128 + +We run 100 iterations for each set of parameters. - Observation 1: throughput_tf32 > throughput_amp when batch_size is small (1, 2, 4, 8); throughput_tf32 < throughput_amp when batch_size is large (16, 32, 64, 128). - Observation 2: The coefficient of variation for the 100 iteration is smallest when batch_size = 128.