# Lecture 29: Fork-Join Pools

## Last Time

Sorting by Divide-and-Conquer

• To sort an array
• partition into two (or more) sub-arrays
• sort the parts
• combine the sorted parts
• Naturally recursive structure

## Today

Divide-and-Conquer in Parallel:

• Fork-Join Pools

## Divide and Conquer

Many computation problems can be solved efficiently by:

1. Breaking an instance into two or more smaller instances
2. Solving the smaller instances (maybe recursively)
3. Combining the smaller solutions to solve the original instance

## Example 1: Searching Unsorted Array

• Given int[] arr of size N
• Does arr contain 1?
• Idea:
1. divide arr in half
2. search left half for 1
3. serach right half for 1
4. return true if step 1 or 2 succeeds

## Example 2: MergeSort

• Given int[] arr of size N
• Sort arr in increasing order
• Idea:
1. divide arr in half
2. sort left half
3. sort right half
4. merge sorted halves

## Observation

Divide-and-conquer often lends itself well to parallelism:

1. Divide instance into smaller instances
2. Solve smaller instances in parallel
3. Combine solutions

## Fork-Join Pools

Idea:

• A thread pool with efficient support for forking:
• divide a task into two or more sub-tasks
• combine solutions (if necessary)
• Naturally lends itself to recursion

## Creating a Fork-Join Pool

Creating a Fork-Join Pool is easy!

• tasks are invoked in FJP
import java.util.concurrent.ForkJoinPool;
...
ForkJoinPool pool = new ForkJoinPool(POOL_SIZE);
...


## Recursive Actions

Tasks without return values = recursive action

• extend RecursiveAction class
• override compute() method

## MergeSort as RecursiveAction

import java.util.concurrent.RecursiveAction;
class MSTask extends RecursiveAction {
public MSTask (double[] data, int min, int max) {...}
@Override
protected void compute () {
if (max - min <= 1) {...}
int mid = min + (max - min) / 2
MTask left = new MTask(data, min, mid);
MTask right = new MTask(data, mid, max);
left.fork(); right.fork(); // or can use right.compute()
left.join(); right.join(); // leave out if right.compute()
merge(data, min, mid, max);}}


Invoke with pool.invoke(new MTask(data, 0, data.length))

## fork versus compute

The difference:

• fork() creates new task to be scheduled by the pool
• must join
• compute() performs computation as part of this task
• no join necessary

Question. Why use one or the other?

## What ForkJoinPool Does

• FJP is a thread pool with a fixed number of threads
• FJP handles scheduling of tasks
• Employs “work-stealing” strategy to minimize time spent waiting for tasks to complete
• Accounts for dependencies between tasks
• AMP Chapter 16

## Efficiency

Often Fork-Join pools are not always as efficient you’d like them to be

To deal with this:

• Use large “base case”
• don’t waste multithreading breaking up small tasks
• Only use on large instances

Still FJPs can lead to elegant solutions, readable code

• Can have better performance if task sizes are irregular

What if we want tasks to return a value?

• Use RecursiveTask<T>!
• task returns a value of type T
• similar to RecursiveAction except compute() returns a T
• pool.invoke(someRecursiveTask<T>) now also returns a T
• join() method also returns a T

## A Simple Example

Finding the maximum value in an unsorted array

• What is a task?
• How to combine results?

## An Activity

Compare the run-times of the two methods!

Download fork-join-pools.zip

1. What values of PARALLEL_LIMIT give better performance?
2. Is there a performance difference for fork/compute compared to fork/fork?

Disclaimer:

• everything about Java is optimized to execute code like findMax efficiently
• fork-join pools are better suited for more complex tasks…

## Next Time

Sorting networks!