Draw this picture as quickly as possible!

Apply SIMD instructions
Apply multithreading

One thread per task
Created Threads and ran them in parallel
Runnable interfacestart instancesjoin to wait until threads finishPiEstimator
for (int i = 0; i < numThreads; i++) {
threads[i] = new Thread(new PiThread(...));
}
for (Thread t : threads) {
t.start();
}
for (Thread t : threads) {
try { t.join(); }
catch (InterruptedException e) { }
}
PiEstimator Performancen threads | pi estimate | time (ms)
-----------------------------------
1 | 3.14158 | 8174
2 | 3.14161 | 4690
4 | 3.14161 | 2709
8 | 3.14163 | 1735
16 | 3.14156 | 1867
32 | 3.14167 | 1938
64 | 3.14156 | 1905
128 | 3.14157 | 1907
256 | 3.14164 | 1919
-----------------------------------
Best performance when number of threads = number of available processors
Reasons:
Question. What if tasks are different (unkown) amount of work?
Threads has significant overhead
When tasks are fairly homogenous (e.g., computing $\pi$, shortcuts) previous approach is good
A nice Java feature: thread pools
Executor interface
void execute(Runnable command) methodExecutorService interface:
ExecutorService ImplementationsFrom java.util.concurrent.Executors:
newFixedThreadPool(int nThreads)
newSingleThreadExecutor()
newCachedThreadPool()
Define tasks
public class MyTask implements Runnable {
...
public void run () {
...
}
}
Create a pool, e.g., fixed thread pool
int nThreads = ...;
ExecutorService pool = Exercutors.newFixedThreadPool(nThreads);
Create and execute tasks
MyTask task = new MyTask(...);
pool.execute(task);
Shutting down the pool
pool.shutdown();
Wait for all pending processes to complete (like join() method)
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
// do nothing
}
Shortcuts from Lab 02:
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k]; float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
}
}
For fixed row i, col j:
float min = Float.MAX_VALUE;
for (int k = 0; k < size; ++k) {
float x = matrix[i][k]; float y = matrix[k][j];
float z = x + y;
if (z < min)
min = z;
}
shortcuts[i][j] = min;
Approach 1:
size * size threadsApproach 2:
availableProcessors()
executer-shortcuts.zipLab will be posted early next week
Runnable task that uses SIMD parallelism to compute escape times