Separable version: average 70ms with filter 21x21. 15ms speedup.
### QUESTION: Compare the visual result to that of the box filter. Is the image LP-filtered with the weighted kernel noticeably better?


The Gaussian filter has a nicer look in our opinion. :-)
### QUESTION: What was the difference in time to a box filter of the same size (5x5)?
No noticeable difference in time. 90ms for both.
...
...
@@ -194,4 +193,4 @@ CPU 0.082142
GPU sorting.
GPU 0.001693
The CPU is faster than the GPU only up til 1024 elements, after that the GPU is always faster. A parallelized CPU will run faster than the current version. However, the GPU will always beat the CPU on large element sizes, since the bitonic sort makes use of massive parallelism, which the CPU cannot.
The CPU is faster than the GPU only up til 1024 elements, after that the GPU is always faster. A parallelized CPU will run faster than the current version. However, the GPU will always beat the CPU on large element sizes, since the bitonic sort makes use of massive parallelism, which the CPU cannot.