Lab 09: Fork-Join Pools

Overview

1. Divide and Conquer & Recursion
2. Fork-Join Pools
3. Activity: Analyzing Behavior

Divide and Conquer

Many computation problems can be solved efficiently by:

1. Breaking an instance into two or more smaller instances
2. Solving the smaller instances (maybe recursively)
3. Combining the smaller solutions to solve the original instance

Example 1: Searching Unsorted Array

• Given int[] arr of size N
• Does arr contain 1?
• Idea:
1. divide arr in half
2. search left half for 1
3. serach right half for 1
4. return true if step 1 or 2 succeeds

Example 2: MergeSort

• Given int[] arr of size N
• Sort arr in increasing order
• Idea:
1. divide arr in half
2. sort left half
3. sort right half
4. merge sorted halves

Other Examples

• Other sorting algorithms
• QuickSort
• Multiplying large numbers
• Multiplying matrices
• Strassen’s algorithm
• Computing Fourier transforms

Observation

Divide-and-conquer often lends itself well to parallelism:

1. Divide instance into smaller instances
2. Solve smaller instances in parallel
3. Combine solutions

Fork-Join Pools

Idea:

• A thread pool with efficient support for forking:
• combine solutions (if necessary)
• Naturally lends itself to recursion

Creating a Fork-Join Pool

Creating a Fork-Join Pool is easy!

• tasks are invoked in FJP
import java.util.concurrent.ForkJoinPool;
...
ForkJoinPool pool = new ForkJoinPool(POOL_SIZE);
...

Recursive Actions

Tasks without return values = recursive action

• extend RecursiveAction class
• override compute() method

MergeSort as RecursiveAction

import java.util.concurrent.RecursiveAction;

...
public MSTask (double[] data, int min, int max) {
...
}

@Override
protected void compute () {
if (max - min <= 1) {
...
return;
}

int mid = min + (max - min) / 2

left.fork();
right.fork(); // or can use right.compute()

left.join();
right.join(); // leave out if right.compute()

merge(data, min, mid, max);
}

void merge (double[] data, int min, int mid, int max) {
...
}
}

Invoke with pool.invoke(new MTask(data, 0, data.length))

Efficiency

Often Fork-Join pools are not as efficient you’d like them to be

To deal with this:

• Use large “base case”
• Only use on large instances

What if we want tasks to return a value?

• task returns a value of type T
• similar to RecursiveAction except compute() returns a T
• pool.invoke(someRecursiveTask<T>) now also returns a T
• join() method also returns a T

A Simple Example

Finding the maximum value in an unsorted array

• How to combine results?

public static int PARALLEL_LIMIT = MaxFinder.DATA_SIZE / 1000;

double[] data;
int min;
int max;

public MaxTask (double[] data, int min, int max) {
this.data = data;
this.min = min;
this.max = max;
}

@Override
protected Double compute () {
if (max - min <= PARALLEL_LIMIT) {
return findMax();
}

MaxTask left = new MaxTask(data, min, min + (max - min) / 2);
MaxTask right = new MaxTask(data, min + (max - min) / 2, max);

right.fork();

double l = left.compute();

double r = right.join();

return Math.max(l, r);
}

private Double findMax() {
double maxValue = Double.MIN_VALUE;

for (int i = min; i < max; ++i) {
if (maxValue < data[i]) {
maxValue = data[i];
}
}

return maxValue;
}
}

The Competition

public static double findMax(double[] data) {
double max = Double.MIN_VALUE;

for (int i = 0; i < DATA_SIZE; ++i) {
if (max < data[i]) {
max = data[i];
}
}

return max;
}

An Activity

Compare the run-times of the two methods!

1. Is there a value of DATA_SIZE for which MaxTask is faster?
2. What values of PARALLEL_LIMIT give better performance?

Disclaimer:

• everything about Java is optimized to execute code like findMax efficiently
• fork-join pools are better suited for more complex tasks…
• …like sorting
• you’ll experience this in the next lab assignment!

Discuss

What did you find?

Simple problem:

• Given a large array of doubles, sort it as quickly as possible
• Use Arrays.sort(...) as a baseline

