diff --git a/README.md b/README.md index 362ab9ad923798d99281447ac6ac4dcf1c66a63f..ac8120106b1c28e674afb3731aca6e58f9d6b5ee 100644 --- a/README.md +++ b/README.md @@ -54,3 +54,13 @@ bash benchmark_nnunet_pytorch_berzelius.sh cd /proj/nsc/xuan/ngc/DeepLearningExamples/PyTorch/Segmentation/nnUNet sbash benchmark_nnunet_pytorch_berzelius_multi_node.sh ``` + +#### Results +We collect benchmark results of throughput (images/sec) for +- Dimention = 2 +- Nodes = 1, 2 +- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes) +- Batch size = 2, 4, 8, 16, 32, 64, 128 +We run 100 iterations for each set of parameters. +- Observation 1: throughput_tf32 > throughput_amp when batch_size is small (1, 2, 4, 8); throughput_tf32 < throughput_amp when batch_size is large (16, 32, 64, 128). +- Observation 2: The coefficient of variation for the 100 iteration is smallest when batch_size = 128.