Skip to content
Snippets Groups Projects
benchmark.txt 2.23 KiB
Newer Older
Xuan Gu's avatar
Xuan Gu committed


MODULE_NAME=nnunet_for_pytorch
MODULE_VERSION=21.11.0
WORK_DIR=/proj/nsc_testing/xuan/berzelius-benchmarks/PyTorch/Segmentation/nnUNet
CONTAINER_DIR=/proj/nsc_testing/xuan/containers/${MODULE_NAME}_${MODULE_VERSION}.sif

mkdir -p $WORK_DIR/data $WORK_DIR/results


To download and preprocess the data run:

Xuan Gu's avatar
Xuan Gu committed
apptainer exec --nv -B ${WORK_DIR}/data:/data -B ${WORK_DIR}/results:/results --pwd /workspace/nnunet_pyt $CONTAINER_DIR python download.py --task 01  
Xuan Gu's avatar
Xuan Gu committed
apptainer exec --nv -B ${WORK_DIR}/data:/data -B ${WORK_DIR}/results:/results --pwd /workspace/nnunet_pyt $CONTAINER_DIR  python /workspace/nnunet_pyt/preprocess.py --task 01 --dim 2
Xuan Gu's avatar
Xuan Gu committed
 
Start benchmarking:

Xuan Gu's avatar
Xuan Gu committed
apptainer exec --nv -B ${WORK_DIR}/data:/data -B ${WORK_DIR}/results:/results --pwd /workspace/nnunet_pyt $CONTAINER_DIR python scripts/benchmark.py --mode train --gpus 1 --dim 2 --batch_size 256 --amp
Xuan Gu's avatar
Xuan Gu committed
apptainer exec --nv -B ${WORK_DIR}/data:/data -B ${WORK_DIR}/results:/results --pwd /workspace/nnunet_pyt $CONTAINER_DIR python scripts/benchmark.py --mode predict --gpus 1 --dim 2 --batch_size 256 --amp
Xuan Gu's avatar
Xuan Gu committed



Xuan Gu's avatar
Xuan Gu committed
################# Issues #################
Xuan Gu's avatar
Xuan Gu committed
# Known issue https://github.com/NVIDIA/DeepLearningExamples/issues/1113
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

Xuan Gu's avatar
Xuan Gu committed
Solution 1: pip install pytorch-lightning==1.5.10, another error raised when benchmarking predict:
Xuan Gu's avatar
Xuan Gu committed
Traceback (most recent call last):
  File "main.py", line 110, in <module>
    trainer.current_epoch = 1
AttributeError: can't set attribute

Xuan Gu's avatar
Xuan Gu committed
Solution 2.1: pip install torchmetrics==0.6.0, another error raised:
Xuan Gu's avatar
Xuan Gu committed
  File "main.py", line 34, in <module>
    set_affinity(int(os.getenv("LOCAL_RANK", "0")), args.gpus, mode=args.affinity)
  File "/workspace/nnunet_pyt/utils/gpu_affinity.py", line 376, in set_affinity
    set_socket_unique_affinity(gpu_id, nproc_per_node, cores, "contiguous", balanced)
  File "/workspace/nnunet_pyt/utils/gpu_affinity.py", line 263, in set_socket_unique_affinity
    os.sched_setaffinity(0, ungrouped_affinities[gpu_id])
OSError: [Errno 22] Invalid argument

Xuan Gu's avatar
Xuan Gu committed
Solution 2.2: commenting the L32-33 in the main.py
Xuan Gu's avatar
Xuan Gu committed
# Muiti-node is not supported in 21.11.0 yet but only in the most recent code on GitHub.