- Use as large batch_size as possible for a more stable benchmark result. For single node, use 256; for multi-node, use 128.
- Use as large batch_size as possible for a more stable benchmark result. For single node, use 256; for multi-node, use 128.
- Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128, 256 takes ~2mins.
- Benchmarking with dim = 2, nodes = 1, gpus = 8, batch_size = 128, 256 takes ~2mins.
- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/xuagu37/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
- Specify the paths for enroot cache and data, see this [page](https://gitlab.liu.se/xuagu37/run-pytorch-and-tensorflow-containers-with-nvidia-enroot#set-path-to-user-container-storage).
- For single node case, command ```srun singularity/enroot``` gives slightly worse throughput performance compared with running ```singularity/enroot``` directly.