diff --git a/README.md b/README.md
index edd125ec780d15dbdd039a2c25e96c551aa9f740..9915b6edda0820d5e714a9d4a5dc3fc436cea7ca 100644
--- a/README.md
+++ b/README.md
@@ -84,13 +84,12 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
 
 - Observation 2: The coefficient of variation of throughput for 100 iterations is smallest when batch_size = 128.  
 
+<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400">
+
 **Benchmarking with dim = 2, nodes = 1, 2, gpus = 8, batch_size = 128 can be used for node health check.** 
 - The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 128 would be 4700 ± 500 (TF32).
 - The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9250 ± 150 (TF32).
 
-
-<img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400">
-
 - Observation 3: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.
 
 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size_ideal.png" width="400">