- For benchmarking purpose, we use copied of a single image
```
bash scripts/copy_data_for_benchmark.sh
...
...
@@ -90,6 +65,7 @@ The input arguments are:
We will average the benchmark performance over the iterations. The maximum usable (without a OOM error) batch size is 256 and 128 for single and multi-node, respectively.
```
cd cd Berzelius-nnU-Net-Benchmark && mkdir -p sbatch_out
Coefficient of variation is calculated as the ratio of the standard deviation to the mean. It shows the extent of variability in relation to the mean of the population.
**Observation 4**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.
- For multi-node benchmarking, we need to use "srun" command; also, the line "#SBATCH --ntasks-per-node=8" has to be added. Otherwise the process will hang.
- Use as large batch_size as possible for a more stable benchmark result. For single node, use 256; for multi-node, use 128.
- Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128, 256 takes ~2mins.
- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/xuagu37/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/berzeliushub/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
- (20220222) ```srun enroot ...```stopped working for multi-node case. Use pyxis instead. See the script ```benchmark_multi_node.sbatch```.