diff --git a/README.md b/README.md index 9f8b815faa646a37662c1c0f77de97078c9dbe1f..6f5a788e878fa85fa9ba3f2b6bed0d5ec2492a5b 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ TF32 (TensorFloat32) mode is for accelerating FP32 convolutions and matrix multi AMP (Automatic Mixed Precision) offers significant computational speedup by performing operations in half-precision (FP16) format, while storing minimal information in single-precision (TF32) to retain as much information as possible in critical parts of the network. -We run 100 iterations for each set of parameters. +We run 100 iterations for each set of parameters. Please see the results in benchmar_table.xlsx. **Observation 1**: Ideally, the improvement of throughput would be linear when the number of GPUs increases. In practice, throughtput stays below the ideal curve when the number of gpus increases. @@ -105,11 +105,7 @@ when batch_size is large (16, 32, 64, 128), throughput_amp > throughput_tf32. <img src="https://github.com/xuagu37/Benchmark_nnU-Net_for_PyTorch/blob/main/figures/benchmark_throughput_cv.png" width="400"> -- The expected throughput for dim = 2, nodes = 1, gpus = 8, batch_size = 256 would be 5130 ± 180 (TF32). -- The expected throughput for dim = 2, nodes = 2, gpus = 16, batch_size = 128 would be 9300 ± 70 (TF32). -- The expected throughput for dim = 2, nodes = 3, gpus = 24, batch_size = 128 would be 13880 ± 85 (TF32). -- The expected throughput for dim = 2, nodes = 4, gpus = 24, batch_size = 128 would be 18500 ± 90 (TF32). - +Coefficient of variation is calculated as the ratio of the standard deviation to the mean. It shows the extent of variability in relation to the mean of the population. **Observation 4**: Ideally, the improvement of throughput would be linear when batch_size increases. In practice, throughtput stays below the ideal curve when batch_size > 16.