diff --git a/README.md b/README.md
index eb30ed1bca95f3d0740f2e2b0601d21b6b6a730e..91b936130304aee11a193754f3eba795594727e1 100644
--- a/README.md
+++ b/README.md
@@ -104,3 +104,4 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.
 - It seems running directly via singularity shell will give worse performance (when I WFH). We should run it via sbatch script instead.
 - It took around a week to finish all iterations of benchmarking.
 - For multi-node bash script, the line "#SBATCH --ntasks-per-node=8" has to been added; otherwise the process will hang.
+- Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128 takes ~2mins. If we want to finish it within a minute, we can change the number of batches from 150 (the default value) to a smaller number.