diff --git a/README.md b/README.md index edd125ec780d15dbdd039a2c25e96c551aa9f740..9915b6edda0820d5e714a9d4a5dc3fc436cea7ca 100644 --- a/README.md +++ b/README.md @@ -84,13 +84,12 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. - Observation 2: The coefficient of variation of throughput for 100 iterations is smallest when batch_size = 128. +<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> + **Benchmarking with dim = 2, nodes = 1, 2, gpus = 8, batch_size = 128 can be used for node health check.** - The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 128 would be 4700 ± 500 (TF32). - The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9250 ± 150 (TF32). - -<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> - - Observation 3: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size_ideal.png" width="400">