diff --git a/README.md b/README.md index b9077dccd73c883b3674ad21d572c23ec80fa02a..eda7f000eb2c411e2b636e398f8f4a64ec646f2a 100644 --- a/README.md +++ b/README.md @@ -99,10 +99,11 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> -- The expected throughput for dim = 2, nodes = 1, gpus = 1, batch_size = 256 would be 670 ± 10 (TF32). -- The expected throughput for dim = 2, nodes = 1, gpus = 4, batch_size = 256 would be 2600 ± 100 (TF32). -- The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 256 would be 5150 ± 150 (TF32). -- The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9250 ± 150 (TF32). +- The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 256 would be 5130 ± 180 (TF32). +- The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9300 ± 70 (TF32). +- The expected throughput for dim = 2, nodes = 3, gpus = 24, batch_size = 128 would be 13880 ± 85 (TF32). +- The expected throughput for dim = 2, nodes = 4, gpus = 24, batch_size = 128 would be 18500 ± 90 (TF32). + **Observation 3**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.