Lecture 29: Fork-Join Pools

COSC 273: Parallel and Distributed Computing

Spring 2023

Last Time

Sorting by Divide-and-Conquer

To sort an array
- partition into two (or more) sub-arrays
- sort the parts
- combine the sorted parts
Naturally recursive structure

Today

Divide-and-Conquer in Parallel:

Fork-Join Pools

Divide and Conquer

Many computation problems can be solved efficiently by:

Breaking an instance into two or more smaller instances
Solving the smaller instances (maybe recursively)
Combining the smaller solutions to solve the original instance

Example 1: Searching Unsorted Array

Given int[] arr of size N
Does arr contain 1?
Idea:
1. divide arr in half
2. search left half for 1
3. serach right half for 1
4. return true if step 1 or 2 succeeds

Example 2: MergeSort

Given int[] arr of size N
Sort arr in increasing order
Idea:
1. divide arr in half
2. sort left half
3. sort right half
4. merge sorted halves

Observation

Divide-and-conquer often lends itself well to parallelism:

Divide instance into smaller instances
Solve smaller instances in parallel
Combine solutions

Fork-Join Pools

Idea:

A thread pool with efficient support for forking:
- divide a task into two or more sub-tasks
- complete sub-tasks
- combine solutions (if necessary)
Naturally lends itself to recursion

Fork/Merge Diagram: Merge Sort

Creating a Fork-Join Pool

Creating a Fork-Join Pool is easy!

tasks are invoked in FJP

import java.util.concurrent.ForkJoinPool;
...
ForkJoinPool pool = new ForkJoinPool(POOL_SIZE);
...
pool.invoke(new SomeTask(...));

Recursive Actions

Tasks without return values = recursive action

extend RecursiveAction class
override compute() method

MergeSort as `RecursiveAction`

import java.util.concurrent.RecursiveAction;
class MSTask extends RecursiveAction {
    public MSTask (double[] data, int min, int max) {...}	
    @Override
    protected void compute () {
        if (max - min <= 1) {...}		
        int mid = min + (max - min) / 2		
        MTask left = new MTask(data, min, mid);
        MTask right = new MTask(data, mid, max);		
        left.fork(); right.fork(); // or can use right.compute()		
        left.join(); right.join(); // leave out if right.compute()
        merge(data, min, mid, max);}}

Invoke with pool.invoke(new MTask(data, 0, data.length))

`fork` versus `compute`

The difference:

fork() creates new task to be scheduled by the pool
- must join
compute() performs computation as part of this task
- no join necessary

Question. Why use one or the other?

What `ForkJoinPool` Does

FJP is a thread pool with a fixed number of threads
FJP handles scheduling of tasks
Employs “work-stealing” strategy to minimize time spent waiting for tasks to complete
- Accounts for dependencies between tasks
- AMP Chapter 16

Efficiency

Often Fork-Join pools are not always as efficient you’d like them to be

To deal with this:

Use large “base case”
- don’t waste multithreading breaking up small tasks
Only use on large instances

Still FJPs can lead to elegant solutions, readable code

Can have better performance if task sizes are irregular

Recursive Task

What if we want tasks to return a value?

Use RecursiveTask<T>!
- task returns a value of type T
- similar to RecursiveAction except compute() returns a T
pool.invoke(someRecursiveTask<T>) now also returns a T
join() method also returns a T

A Simple Example

Finding the maximum value in an unsorted array

What is a task?
How to combine results?

An Activity

Compare the run-times of the two methods!

Download fork-join-pools.zip

What values of PARALLEL_LIMIT give better performance?
Is there a performance difference for fork/compute compared to fork/fork?

Disclaimer:

everything about Java is optimized to execute code like findMax efficiently
fork-join pools are better suited for more complex tasks…

What Happened?

Next Time

Sorting networks!