diff --git a/README.md b/README.md index 01ecbc57ec5577e3d7d67c6ebe74676b69d601bf..025581c278e0916527ec7f12a7b8b4cd4eb1f5db 100644 --- a/README.md +++ b/README.md @@ -84,8 +84,8 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. - Observation 2: The coefficient of variation of throughput for the 100 iterations is smallest when batch_size = 128. -Benchmarking with dim = 2, nodes = 1,2, gpus = 8, batch_size = 128 can be used for node health check. -For example, the expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 128 would be ? ± ? (TF32) and ? ± ? (AMP). +**Benchmarking with dim = 2, nodes = 1, 2, gpus = 8, batch_size = 128 can be used for node health check. +For example, the expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 128 would be 4700 ± 500 (TF32).** <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> @@ -95,7 +95,7 @@ For example, the expected throughput for dim = 2, nodes = 1, gpus = 8, batch_siz - Observation 4: Ideally, the improvement of throughput would be linear when the number of GPUs increases. In practice, throughtput stays below the ideal curve when gpus increases. -<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_gpus_ideal.png" width="400"> +<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_gpus_ideal.png" width="400"> #### Notes