Update README.md

cd49b047 · Xuan Gu · GitHub · e059bcb9 · cd49b047
Unverified Commit cd49b047 authored 2 years ago by Xuan Gu Committed by GitHub 2 years ago
--- a/README.md
+++ b/README.md
@@ -89,13 +89,19 @@ TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multi

 AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision (FP16) format, while storing minimal information in single-precision (TF32) to retain as much information as possible in critical parts of the network.   

-We run 100 iterations for each set of parameters.
-**Observation 1**: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
+We run 100 iterations for each set of parameters.  
+
+**Observation 1**: Ideally, the improvement of throughput would be linear when the number of GPUs increases.  
+In practice, throughtput stays below the ideal curve when the number of gpus increases.
+
+<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_gpus_ideal.png" width="1000">
+
+**Observation 2**: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
 when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.  

 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size.png" width="400">

-**Observation 2**: Benchmark results are more stable when larger batch_size.  
+**Observation 3**: Benchmark results are more stable when larger batch_size.  

 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400">

@@ -105,13 +111,10 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
 - The expected throughput for dim = 2, nodes = 4, gpus = 24, batch_size = 128 would be 18500 ± 90 (TF32).


-**Observation 3**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.
+**Observation 4**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.

 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size_ideal.png" width="400">

-**Observation 4**: Ideally, the improvement of throughput would be linear when the number of GPUs increases. In practice, throughtput stays below the ideal curve when the number of gpus increases.
-
-<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_gpus_ideal.png" width="1000">


 #### Notes