Will Rosenbaum | Final Option 02: Sorting

Background

Sorting is a fundamental task in computer science. As such, most general-purpose programming languages provide a built-in sorting function. For example, Java provides the Arrays.sort(...) method. The sort method will takes an array numbers (int, double, float, etc.) and sorts the array in-place in ascending order. For example, after calling

double[] data = {4.0, 2.0, 3.0, 1.0};
Arrays.sort(data)

data will store the values {1.0, 2.0. 3.0, 4.0}. The Arrays.sort method is quite efficient: on my laptop, it sorts a random array of 10 million doubles in about 1 second. Nonetheless, there is always room for improvement!

Your Task

In this lab, you will use multithreading and parallelism in order to implement a faster sorting procedure than Arrays.sort (at least for large arrays). Specifically, you should implement a method Sorting.parallelSort(double[] data) that sorts an array of doubles in place. To get started, download the following archive, which includes a program SortingTester.java to test your implementation.

final-sorting.zip

Suggestions and Resources

Many of the fastest practical sorting algorithms use some variant of quicksort. We briefly describe a high-level description of quicksort. For simplicity we assume that elements in the array are pair-wise distinct (i.e., not value is repeated), but your program should handle the case where repeated values are allowed.

Choose a pivot element in the array—typically either some fixed element or the a random element in the array.
Reorder the array so that
- pivot is at index p and
- all elements at indices i < p are smaller than pivot
- all elements at indices i > p are greater than pivot
Recursively sort
- the sub-array from indices 0 to p-1
- the sub-array from indices p+1 to n (where n is the original size of the array).

The recursive sorting in step 3 can again be performed using quicksort. To this end, it is useful to define the general method to take three parameters: the array data to be sorted, as well as indices i and j. A call to quicksort(data, i, j) will return an array such that the elements of data with indices i, i+1,..., j-1 are sorted, while the remaining entries are unchanged.

For our purposes, the quicksort algorithm is appealing for two reasons. First, it sorts elements in-place. That is, all of the operations can be performed on the original array without duplicating data. (In particular, this is true of step 2). Second, the sub-steps of step 3 of the procedure can be performed in parallel.

For a detailed account of implementing and parallelizing quicksort (and other sorting algorithms), see

Sorting and Selection in Sequential and Parallel Data Structures and Algorithms (book chapter available here).

Since quicksort follows a divide-and-conquer approach, you may find the following documentation/tutorial helpful:

Note that there are many different variants of quicksort. Exactly which version you use and how it is implemented can have a significant impact on your program’s performance.

Tips

For reasonably small arrays, Arrays.sort() is likely to be faster than a more complex multi-threaded procedure. You may freely use Arrays.sort() as a sub-routine/base case for your sorting procedure.
If you choose to implement quicksort, the most costly part of the execution is the “reordering” step (step 2). Even though the procedure can be conceptually simple, there are a lot of opportunities for inefficiency. In particular, for a large array you will want to consider the cache performance and memory access pattern to the input.