From cd49b04717ab5a30f6ce732f3b57a9fcbdaf44b4 Mon Sep 17 00:00:00 2001 From: Xuan Gu <xuagu37@gmail.com> Date: Mon, 31 Oct 2022 14:23:55 +0100 Subject: [PATCH] Update README.md --- README.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 9c1892b..64da12a 100644 --- a/README.md +++ b/README.md @@ -89,13 +89,19 @@ TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multi AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision (FP16) format, while storing minimal information in single-precision (TF32) to retain as much information as possible in critical parts of the network. -We run 100 iterations for each set of parameters. -**Observation 1**: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32; +We run 100 iterations for each set of parameters. + +**Observation 1**: Ideally, the improvement of throughput would be linear when the number of GPUs increases. +In practice, throughtput stays below the ideal curve when the number of gpus increases. + +<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_gpus_ideal.png" width="1000"> + +**Observation 2**: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32; when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size.png" width="400"> -**Observation 2**: Benchmark results are more stable when larger batch_size. +**Observation 3**: Benchmark results are more stable when larger batch_size. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> @@ -105,13 +111,10 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. - The expected throughput for dim = 2, nodes = 4, gpus = 24, batch_size = 128 would be 18500 ± 90 (TF32). -**Observation 3**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16. +**Observation 4**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size_ideal.png" width="400"> -**Observation 4**: Ideally, the improvement of throughput would be linear when the number of GPUs increases. In practice, throughtput stays below the ideal curve when the number of gpus increases. - -<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_gpus_ideal.png" width="1000"> #### Notes -- GitLab