Newer
Older
MODEL_NAME=maskrcnn_for_pytorch
MODEL_VERSION=latest
MODEL_BASE=/proj/nsc_testing/xuan/containers/nvidia_pytorch_21.12-py3.sif
CONTAINER_DIR=/proj/nsc_testing/xuan/containers/${MODEL_NAME}_${MODEL_VERSION}.sif
DEF_DIR=/proj/nsc_testing/xuan/berzelius-benchmarks/NVIDIA/DeepLearningExamples/PyTorch/Segmentation/MaskRCNN/${MODEL_NAME}_${MODEL_VERSION}.def
WORK_DIR=/proj/nsc_testing/xuan/berzelius-benchmarks/NVIDIA/DeepLearningExamples/PyTorch/Segmentation/MaskRCNN/object_detection
### Make a copy of the code
```
apptainer exec $CONTAINER_DIR bash -c "cp -a /workspace/object_detection/* ${WORK_DIR}/object_detection"
```
apptainer exec --nv -B ${WORK_DIR}/object_detection/data:/data --pwd /data $CONTAINER_DIR bash -c "cp /workspace/object_detection/hashes.md5 /data/ && bash /workspace/object_detection/download_dataset.sh /data"
apptainer exec --nv $CONTAINER_DIR bash -c "cp -a /workspace/object_detection/* ${WORK_DIR}/object_detection/"
apptainer exec --nv -B ${WORK_DIR}/object_detection/data:/datasets/data -B ${WORK_DIR}/object_detection/results:/results --pwd ${WORK_DIR} $CONTAINER_DIR bash scripts/train_benchmark.sh float16 1 True True
apptainer exec --nv -B ${WORK_DIR}/object_detection/data:/datasets/data -B ${WORK_DIR}/object_detection/results:/results --pwd ${WORK_DIR} $CONTAINER_DIR bash scripts/inference_benchmark.sh float16 1
The checkpoint file `results/last_checkpoint` has to be removed for a new benchmark train run.