diff --git a/README.md b/README.md index 6f5a788e878fa85fa9ba3f2b6bed0d5ec2492a5b..4b645c9d1c2d6620325f6f252b34997b2aca7197 100644 --- a/README.md +++ b/README.md @@ -51,39 +51,13 @@ sbash benchmark_nnunet_pytorch_berzelius.sh sbash benchmark_nnunet_pytorch_berzelius_multi_node.sh ``` -<!-- -#### For single node -- Start an interactive session -``` -interactive -N2 --reservation=nsc-testing -t 600 -``` - -- Pull the image for Singularity and run -``` -cd /proj/nsc/xuan/ngc/DeepLearningExamples/PyTorch/Segmentation/nnUNet -singularity pull nvidia_nnu-net_for_pytorch.sif docker://xuagu37/nvidia_nnu-net_for_pytorch:21.11.0 -singularity shell -B ${PWD}/data:/data -B ${PWD}/results:/results --nv nvidia_nnu-net_for_pytorch.sif -``` -- Run the benchmark script -``` -bash benchmark_nnunet_pytorch_berzelius.sh -``` - -#### For multi-node -- Run the benchmark script -``` -cd /proj/nsc/xuan/ngc/DeepLearningExamples/PyTorch/Segmentation/nnUNet -sbash benchmark_nnunet_pytorch_berzelius_multi_node.sh -``` ---> - ### Results We collect benchmark results of throughput (images/sec) for - Precisions = TF32, AMP - Dimention = 2 -- Nodes = 1, 2 -- GPUs = 1 - 8 (for 1 node), 16 (for 2 nodes) -- Batch size = 1, 2, 4, 8, 16, 32, 64, 128 +- Nodes = 1, 2, 3, 4, 5, 6, 7, 8 +- GPUs = 1 - 8 (for 1 node), all gpus (for multi-node) +- Batch size = 1, 2, 4, 8, 16, 32, 64, 128, 256 TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multiplications. TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture.