diff --git a/README.md b/README.md
index 4f274f935b8e29794e10c8ea3a6fd244b74be468..b9077dccd73c883b3674ad21d572c23ec80fa02a 100644
--- a/README.md
+++ b/README.md
@@ -95,12 +95,13 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
 
 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_batch_size.png" width="400">
 
-**Observation 2**: The coefficient of variation of throughput for 100 iterations is smallest when batch_size = 128.  
+**Observation 2**: Benchmark results are more stable when larger batch_size.  
 
 <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400">
 
-**Benchmarking with dim = 2, nodes = 1, 2, gpus = 8, batch_size = 128 can be used for node health check.** 
-- The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 128 would be 4700 ± 500 (TF32).
+- The expected throughput for dim = 2, nodes = 1, gpus = 1, batch_size = 256 would be 670 ± 10 (TF32).
+- The expected throughput for dim = 2, nodes = 1, gpus = 4, batch_size = 256 would be 2600 ± 100 (TF32).
+- The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 256 would be 5150 ± 150 (TF32).
 - The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9250 ± 150 (TF32).
 
 **Observation 3**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.
@@ -118,4 +119,4 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
 - For multi-node benchmarking, we need to use "srun" command; also, the line "#SBATCH --ntasks-per-node=8" has to been added. Otherwise the process will hang.
 - Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128 takes ~2mins.  
 If we want to finish it within a minute, we can change the number of batches from 150 (the default value) to a smaller number. Or we can try some smaller datasets.
-- On single node, max batch_size is 256; on multi-node, max batch_size is 128.
+- Use as large batch_size as possible for a more stable benchmark result.