diff --git a/README.md b/README.md
index 362ab9ad923798d99281447ac6ac4dcf1c66a63f..ac8120106b1c28e674afb3731aca6e58f9d6b5ee 100644
--- a/README.md
+++ b/README.md
@@ -54,3 +54,13 @@ bash benchmark_nnunet_pytorch_berzelius.sh
 cd /proj/nsc/xuan/ngc/DeepLearningExamples/PyTorch/Segmentation/nnUNet
 sbash benchmark_nnunet_pytorch_berzelius_multi_node.sh
 ```
+
+#### Results  
+We collect benchmark results of throughput (images/sec) for  
+- Dimention = 2
+- Nodes = 1, 2
+- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes)
+- Batch size = 2, 4, 8, 16, 32, 64, 128
+We run 100 iterations for each set of parameters.  
+- Observation 1: throughput_tf32 > throughput_amp when batch_size is small (1, 2, 4, 8); throughput_tf32 < throughput_amp when batch_size is large (16, 32, 64, 128).
+- Observation 2: The coefficient of variation for the 100 iteration is smallest when batch_size = 128.