Update 2 files

- /README.md - /scripts/benchmark_single_node.sbatch

Update 2 files
a39e745d · Xuan Gu · 27745028 · a39e745d · a39e745d
Commit a39e745d authored 1 year ago by Xuan Gu
--- a/README.md
+++ b/README.md
 # Berzelius nnU-Net Benchmark   

-The benchmarking is based on [Nvidia NGC nnU-net for Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/nnunet_for_pytorch) v21.11.0.  
+The benchmarking is based on [Nvidia NGC nnU-net for Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/nnunet_for_pytorch) v$VERSION.  

+VERSION=21.11.0
 ### On local computer (optional)
 - Download the code  
 ```
-wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/nnunet_for_pytorch/versions/21.11.0/zip -O /tmp/nnunet_for_pytorch_21.11.0.zip
-mkdir /samsung1t/ngc/nnunet_for_pytorch_21.11.0
-unzip /tmp/nnunet_for_pytorch_21.11.0.zip -d /samsung1t/ngc/nnunet_for_pytorch_21.11.0/ 
-cd /samsung1t/ngc/nnunet_for_pytorch_21.11.0/ 
+wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/nnunet_for_pytorch/versions/$VERSION/zip -O /tmp/nnunet_for_pytorch_$VERSION.zip
+mkdir ~/DeepLearningExamples/nnunet_for_pytorch_$VERSION
+unzip /tmp/nnunet_for_pytorch_$VERSION.zip -d ~/DeepLearningExamples/nnunet_for_pytorch_$VERSION/ 
+cd ~/DeepLearningExamples/nnunet_for_pytorch_$VERSION/ 
 ```

 - Build the nnU-Net PyTorch Docker container
@@ -18,63 +19,37 @@ docker build -t nnunet .

 - Push the container to Docker Hub
 ```
-docker tag nnunet:latest xuagu37/nvidia_nnu-net_for_pytorch:21.11.0
-docker push xuagu37/nvidia_nnu-net_for_pytorch:21.11.0
+docker tag nnunet:latest berzeliushub/nvidia_nnu-net_for_pytorch:$VERSION
+docker push berzeliushub/nvidia_nnu-net_for_pytorch:$VERSION
 ```

 ### On Berzelius

 - Create directories
 ```
-cd /proj/nsc_testing/xuan
+cd /proj/nsc_testing/xuan/DeepLearningExamples/
 git clone https://gitlab.liu.se/xuagu37/Berzelius-nnU-Net-Benchmark.git
 cd Berzelius-nnU-Net-Benchmark
 mkdir data results
 ```
-<!-- - Clone the repository  
-```
-cd /proj/nsc_testing/xuan/ngc
-git clone https://github.com/NVIDIA/DeepLearningExamples
-cd DeepLearningExamples/PyTorch/Segmentation/nnUNet
-mkdir data results
-``` -->

-Docker is not available on Berzelius. We us Apptainer or Enroot.
+Docker is not available on Berzelius. We use Apptainer or Enroot.

 - Prepare the dataset

 With Apptainer
 ```
-apptainer pull nvidia_nnu-net_for_pytorch.sif docker://xuagu37/nvidia_nnu-net_for_pytorch:21.11.0
+apptainer pull nvidia_nnu-net_for_pytorch.sif docker://berzeliushub/nvidia_nnu-net_for_pytorch:$VERSION
 apptainer exec --nv -B ${PWD}/data:/data -B ${PWD}/results:/results --nv nvidia_nnu-net_for_pytorch.sif bash -c "cd /workspace/nnunet_pyt && python download.py --task 01 && python preprocess.py --task 01 --dim 2"
 ```  

 With Enroot
 ```
-enroot import 'docker://xuagu37/nvidia_nnu-net_for_pytorch:21.11.0'
-enroot create --name nnunet xuagu37+nvidia_nnu-net_for_pytorch+21.11.0.sqsh
+enroot import 'docker://berzeliushub/nvidia_nnu-net_for_pytorch:$VERSION'
+enroot create --name nnunet berzeliushub+nvidia_nnu-net_for_pytorch+$VERSION.sqsh
 enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet bash -c "cd /workspace/nnunet_pyt && python download.py --task 01 && python preprocess.py --task 01 --dim 2"
 ```

-<!-- Using singularity  
-```
-singularity pull nvidia_nnu-net_for_pytorch.sif docker://xuagu37/nvidia_nnu-net_for_pytorch:21.11.0
-singularity shell -B ${PWD}/data:/data -B ${PWD}/results:/results --nv nvidia_nnu-net_for_pytorch.sif  
-```  
-Or using enroot  
-```
-enroot import 'docker://xuagu37/nvidia_nnu-net_for_pytorch:21.11.0'
-enroot create --name nnunet xuagu37+nvidia_nnu-net_for_pytorch+21.11.0.sqsh
-enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet
-```
- Prepare BraTS dataset (within the image)
-```
-python download.py --task 01
-python preprocess.py --task 01 --dim 2
-```
-Exit the image. 
-->
-
 - For benchmarking purpose, we use copied of a single image 
 ```
 bash scripts/copy_data_for_benchmark.sh
@@ -90,6 +65,7 @@ The input arguments are:

 We will average the benchmark performance over the iterations. The maximum usable (without a OOM error) batch size is 256 and 128 for single and multi-node, respectively.
 ```
+cd cd Berzelius-nnU-Net-Benchmark && mkdir -p sbatch_out
 bash scripts/benchmark_sbatch_submit.sh 1 8 100 128
 ```

@@ -110,22 +86,22 @@ We run 100 iterations for each set of parameters. Please see the results in benc
 **Observation 1**: Ideally, the improvement of throughput would be linear when the number of GPUs increases.  
 In practice, throughtput stays below the ideal curve when the number of gpus increases.

-<img src="https://gitlab.liu.se/xuagu37/Benchmark_nnU-Net_for_PyTorch/-/raw/74d9160cec1caaf2c4531db3ae6096b518229b32/figures/benchmark_throughput_gpus_ideal.png" width="800">
+<img src="https://gitlab.liu.se/berzeliushub/Benchmark_nnU-Net_for_PyTorch/-/raw/74d9160cec1caaf2c4531db3ae6096b518229b32/figures/benchmark_throughput_gpus_ideal.png" width="800">

 **Observation 2**: when batch_size is small (1, 2, 4, 8), throughput_amp ≈ throughput_tf32;  
 when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32.  

-<img src="https://gitlab.liu.se/xuagu37/Benchmark_nnU-Net_for_PyTorch/-/raw/3a4941c09c5280ef3749d44b3af14dcccacc38f7/figures/benchmark_throughput_batch_size.png" width="800">
+<img src="https://gitlab.liu.se/berzeliushub/Benchmark_nnU-Net_for_PyTorch/-/raw/3a4941c09c5280ef3749d44b3af14dcccacc38f7/figures/benchmark_throughput_batch_size.png" width="800">

 **Observation 3**: Benchmark results are more stable when larger batch_size.  

-<img src="https://gitlab.liu.se/xuagu37/Benchmark_nnU-Net_for_PyTorch/-/raw/e62617c63bfb4d167a78faf84156956bbc8f52bb/figures/benchmark_throughput_cv.png" width="800">
+<img src="https://gitlab.liu.se/berzeliushub/Benchmark_nnU-Net_for_PyTorch/-/raw/e62617c63bfb4d167a78faf84156956bbc8f52bb/figures/benchmark_throughput_cv.png" width="800">

 Coefficient of variation is calculated as the ratio of the standard deviation to the mean. It shows the extent of variability in relation to the mean of the population. 

 **Observation 4**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.

-<img src="https://gitlab.liu.se/xuagu37/Benchmark_nnU-Net_for_PyTorch/-/raw/ec0f070f718c05d46c6090cc3f8d6ebb29f93725/figures/benchmark_throughput_batch_size_ideal.png" width="800">
+<img src="https://gitlab.liu.se/berzeliushub/Benchmark_nnU-Net_for_PyTorch/-/raw/ec0f070f718c05d46c6090cc3f8d6ebb29f93725/figures/benchmark_throughput_batch_size_ideal.png" width="800">



@@ -140,5 +116,5 @@ Ref: https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html
 - For multi-node benchmarking, we need to use "srun" command; also, the line "#SBATCH --ntasks-per-node=8" has to be added. Otherwise the process will hang.
 - Use as large batch_size as possible for a more stable benchmark result. For single node, use 256; for multi-node, use 128.  
 - Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128, 256 takes ~2mins.  
- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/xuagu37/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
+- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/berzeliushub/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
 - (20220222) ```srun enroot ...```stopped working for multi-node case. Use pyxis instead. See the script ```benchmark_multi_node.sbatch```.
--- a/scripts/benchmark_single_node.sbatch
+++ b/scripts/benchmark_single_node.sbatch
@@ -4,20 +4,20 @@
 #SBATCH --nodes=1
 #SBATCH --gres=gpu:8
 #SBATCH --time=0-0:10:00
-#####SBATCH --reservation=bt-xuan_1node_20221020_0900
+#SBATCH --reservation=devel


 # For apptainer
-#rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json
-#apptainer exec --nv -B ${PWD}/data:/data -B ${PWD}/results:/results nvidia_nnu-net_for_pytorch.sif bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --logname="benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json""  
+rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json
+apptainer exec --nv -B ${PWD}/data:/data -B ${PWD}/results:/results nvidia_nnu-net_for_pytorch.sif bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --logname='benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json'"  

-#rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json
-#apptainer exec --nv -B ${PWD}/data:/data -B ${PWD}/results:/results nvidia_nnu-net_for_pytorch.sif bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --amp --logname="benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json"  
+rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json
+apptainer exec --nv -B ${PWD}/data:/data -B ${PWD}/results:/results nvidia_nnu-net_for_pytorch.sif bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --amp --logname='benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json'"  

 # For enroot
-rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json
-enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --logname="benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json""
+#rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json
+#enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --logname='benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_tf32_iteration${5}.json'"

-rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json
-enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --amp --logname="benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json""
+#rm -f results/benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json
+#enroot start --rw --mount ${PWD}/data:/data --mount ${PWD}/results:/results nnunet bash -c "cd /workspace/nnunet_pyt && python scripts/benchmark.py --mode train --gpus ${3} --dim ${1} --batch_size ${4} --nodes ${2} --amp --logname='benchmark_dim${1}_nodes${2}_gpus${3}_batchsize${4}_amp_iteration${5}.json'"