diff --git a/README.md b/README.md index 915ca99787da47eb64fce40ef4b6b2cdce470a78..eb30ed1bca95f3d0740f2e2b0601d21b6b6a730e 100644 --- a/README.md +++ b/README.md @@ -103,3 +103,4 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. #### Notes - It seems running directly via singularity shell will give worse performance (when I WFH). We should run it via sbatch script instead. - It took around a week to finish all iterations of benchmarking. +- For multi-node bash script, the line "#SBATCH --ntasks-per-node=8" has to been added; otherwise the process will hang.