Skip to content
Snippets Groups Projects
Commit dae5dd98 authored by ylvse560's avatar ylvse560
Browse files

Add first answers to questions in Lab 3

parent 771eb62b
Branches
No related tags found
No related merge requests found
...@@ -2,7 +2,8 @@ ...@@ -2,7 +2,8 @@
This is the git repository for the six labs in the course Multicore and GPU programming (TDDD56) at Linköping University. This is the git repository for the six labs in the course Multicore and GPU programming (TDDD56) at Linköping University.
# 4 Before the lab session # Theoretical questions
## Lab 1
* Write a detailed explanation why computation load can be imbalanced and how it * Write a detailed explanation why computation load can be imbalanced and how it
affects the global performance. affects the global performance.
...@@ -20,3 +21,32 @@ Each pixel will take an unknown time to compute and therefore it is impossible t ...@@ -20,3 +21,32 @@ Each pixel will take an unknown time to compute and therefore it is impossible t
with shared (critical section) or distributed work pool. with shared (critical section) or distributed work pool.
![Graph of loadbalancing methods text](Lab1/lab1measurements.png) ![Graph of loadbalancing methods text](Lab1/lab1measurements.png)
## Lab 2
#### Question 1.1: Why does SkePU have a "fused" MapReduce when there already are separate Map and Reduce skeletons? Hint: Think about memory access patterns.
If you use Map and Reduce in a fused variant, you only have to access the shared memory vector once, and load each element to the local cache to that processor.
#### Question 1.2: Is there any practical reason to ever use separate Map and Reduce in sequence?
If you need to use the vector that the Map returns to anything else in the program, this will be necessary.
#### Question 1.3: Is there a SkePU backend which is always more efficient to use, or does this depend on the problem size? Why? Either show with measurements or provide a valid reasoning.
CPU: Small problems sizes will be faster because the clock frequency of the CPU is faster than the GPU.
GPU: Big problem sizes will be faster because there are many more cores in the GPU.
#### Question 1.4: Try measuring the parallel back-ends with measureExecTime exchanged for measureExecTimeIdempotent. This measurement does a "cold run"of the lambda expression before running the proper measurement. Do you see a difference for some backends, and if so, why?
#### Question 2.1: Which version of the averaging filter (unified, separable) is the most efficient? Why?
#### Question 3.1: In data-parallel skeletons like MapOverlap, all elements are processed independently of each other. Is this a good fit for the median filter? Why/why not?
#### Question 3.2: Describe the sequence of instructions executed in your user-function. Is it data dependent? What does this mean for e.g., automatic vectorization, or the GPU backend?
## Lab 3
## Lab 4
## Lab 5
## Lab 6
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment