Newer
Older
MODEL_NAME=maskrcnn_for_pytorch
MODEL_VERSION=latest
MODEL_BASE=/proj/nsc_testing/xuan/containers/nvidia_pytorch_21.12-py3.sif
CONTAINER_DIR=/proj/nsc_testing/xuan/containers/${MODEL_NAME}_${MODEL_VERSION}.sif
DEF_DIR=/proj/nsc_testing/xuan/berzelius-benchmarks/NVIDIA/DeepLearningExamples/PyTorch/Segmentation/MaskRCNN/${MODEL_NAME}_${MODEL_VERSION}.def
WORK_DIR=/proj/nsc_testing/xuan/berzelius-benchmarks/NVIDIA/DeepLearningExamples/PyTorch/Segmentation/MaskRCNN
mkdir -p $WORK_DIR/data $WORK_DIR/results
apptainer build $CONTAINER_DIR $DEF_DIR
```
### Downloading and preprocessing the data
```
apptainer exec --nv -B ${WORK_DIR}/data:/data --pwd /data $CONTAINER_DIR bash -c "cp /workspace/object_detection/hashes.md5 /data/ && bash /workspace/object_detection/download_dataset.sh /data"
apptainer exec --nv $CONTAINER_DIR bash -c "cp -a /workspace/object_detection/* ${WORK_DIR}/"
apptainer exec --nv -B ${WORK_DIR}/data:/datasets/data -B ${WORK_DIR}/results:/results --pwd ${WORK_DIR} $CONTAINER_DIR bash scripts/train_benchmark.sh float16 1 True True
apptainer exec --nv -B ${WORK_DIR}/data:/datasets/data -B ${WORK_DIR}/results:/results --pwd ${WORK_DIR} $CONTAINER_DIR bash scripts/inference_benchmark.sh float16 1
The checkpoint file `results/last_checkpoint` has to be removed for a new benchmark train run.