Lab 09: Fork-Join Pools

Announcements

Feedback on final project proposals in next few days!

Overview

Divide and Conquer & Recursion
Fork-Join Pools
Activity: Analyzing Behavior

Divide and Conquer

Many computation problems can be solved efficiently by:

Breaking an instance into two or more smaller instances
Solving the smaller instances (maybe recursively)
Combining the smaller solutions to solve the original instance

Example 1: Searching Unsorted Array

Given int[] arr of size N
Does arr contain 1?
Idea:
1. divide arr in half
2. search left half for 1
3. serach right half for 1
4. return true if step 1 or 2 succeeds

Example 2: MergeSort

Given int[] arr of size N
Sort arr in increasing order
Idea:
1. divide arr in half
2. sort left half
3. sort right half
4. merge sorted halves

Other Examples

Other sorting algorithms
- QuickSort
Multiplying large numbers
Multiplying matrices
- Strassen’s algorithm
Computing Fourier transforms

Observation

Divide-and-conquer often lends itself well to parallelism:

Divide instance into smaller instances
Solve smaller instances in parallel
Combine solutions

Fork-Join Pools

Idea:

A thread pool with efficient support for forking:
- divide a task into two or more sub-tasks
- complete sub-tasks
- combine solutions (if necessary)
Naturally lends itself to recursion

Fork/Merge Diagram: Merge Sort

Creating a Fork-Join Pool

Creating a Fork-Join Pool is easy!

tasks are invoked in FJP

import java.util.concurrent.ForkJoinPool;
...
ForkJoinPool pool = new ForkJoinPool(POOL_SIZE);
...
pool.invoke(new SomeTask(...));

Recursive Actions

Tasks without return values = recursive action

extend RecursiveAction class
override compute() method

MergeSort as `RecursiveAction`

import java.util.concurrent.RecursiveAction;

class MSTask extends RecursiveAction {
    ...
    public MSTask (double[] data, int min, int max) {
        ...
    }
	
    @Override
    protected void compute () {
        if (max - min <= 1) {
            ...
            return;
        }
		
        int mid = min + (max - min) / 2
		
        MTask left = new MTask(data, min, mid);
        MTask right = new MTask(data, mid, max);
		
        left.fork();
        right.fork(); // or can use right.compute()
		
        left.join();
        right.join(); // leave out if right.compute()
        
        merge(data, min, mid, max);
    }
	
    void merge (double[] data, int min, int mid, int max) {
        ...	
    }
}

Invoke with pool.invoke(new MTask(data, 0, data.length))

Efficiency

Often Fork-Join pools are not as efficient you’d like them to be

To deal with this:

Use large “base case”
- don’t waste multithreading breaking up small tasks
Only use on large instances

Still FJPs can lead to elegant solutions, readable code

Recursive Task

What if we want tasks to return a value?

Use RecursiveTask<T>!
- task returns a value of type T
- similar to RecursiveAction except compute() returns a T
pool.invoke(someRecursiveTask<T>) now also returns a T
join() method also returns a T

A Simple Example

Finding the maximum value in an unsorted array

What is a task?
How to combine results?

MaxTask

class MaxTask extends RecursiveTask<Double> {
    public static int PARALLEL_LIMIT = MaxFinder.DATA_SIZE / 1000;
    
    double[] data;
    int min;
    int max;
    
    public MaxTask (double[] data, int min, int max) {
	this.data = data;
	this.min = min;
	this.max = max;
    }

    @Override
    protected Double compute () {
	if (max - min <= PARALLEL_LIMIT) {
	    return findMax();
	}

	MaxTask left = new MaxTask(data, min, min + (max - min) / 2);
	MaxTask right = new MaxTask(data, min + (max - min) / 2, max);

	right.fork();

	double l = left.compute();
	
	double r = right.join();

	return Math.max(l, r); 
    }

    private Double findMax() {
	double maxValue = Double.MIN_VALUE;

	for (int i = min; i < max; ++i) {
	    if (maxValue < data[i]) {
		maxValue = data[i];
	    }
	}

	return maxValue;
    }
}

The Competition

    public static double findMax(double[] data) {
	double max = Double.MIN_VALUE;

	for (int i = 0; i < DATA_SIZE; ++i) {
	    if (max < data[i]) {
		max = data[i];
	    }
	}

	return max;
    }

An Activity

Compare the run-times of the two methods!

Is there a value of DATA_SIZE for which MaxTask is faster?
What values of PARALLEL_LIMIT give better performance?

Disclaimer:

everything about Java is optimized to execute code like findMax efficiently
fork-join pools are better suited for more complex tasks…
- …like sorting
- you’ll experience this in the next lab assignment!

Discuss

What did you find?

Lab 05: Our Last Lab

Simple problem:

Given a large array of doubles, sort it as quickly as possible
Use Arrays.sort(...) as a baseline

Details to follow…