Final Option 02: Sorting
sorting faster than ever before
Background
Sorting is a fundamental task in computer science. As such, most general-purpose programming languages provide a built-in sorting function. For example, Java provides the Arrays.sort(...)
method. The sort
method will takes an array numbers (int
, double
, float
, etc.) and sorts the array in-place in ascending order. For example, after calling
1
2
double[] data = {4.0, 2.0, 3.0, 1.0};
Arrays.sort(data)
data
will store the values {1.0, 2.0. 3.0, 4.0}
. The Arrays.sort
method is quite efficient: on my laptop, it sorts a random array of 10 million double
s in about 1 second. Nonetheless, there is always room for improvement!
Your Task
In this lab, you will use multithreading and parallelism in order to implement a faster sorting procedure than Arrays.sort
(at least for large arrays). Specifically, you should implement a method Sorting.parallelSort(double[] data)
that sorts an array of double
s in place. To get started, download the following archive, which includes a program SortingTester.java
to test your implementation.
Suggestions and Resources
Many of the fastest practical sorting algorithms use some variant of quicksort. We briefly describe a high-level description of quicksort. For simplicity we assume that elements in the array are pair-wise distinct (i.e., not value is repeated), but your program should handle the case where repeated values are allowed.
- Choose a
pivot
element in the array—typically either some fixed element or the a random element in the array. - Reorder the array so that
pivot
is at indexp
and- all elements at indices
i < p
are smaller thanpivot
- all elements at indices
i > p
are greater thanpivot
- Recursively sort
- the sub-array from indices
0
top-1
- the sub-array from indices
p+1
ton
(wheren
is the original size of the array).
- the sub-array from indices
The recursive sorting in step 3 can again be performed using quicksort. To this end, it is useful to define the general method to take three parameters: the array data
to be sorted, as well as indices i
and j
. A call to quicksort(data, i, j)
will return an array such that the elements of data
with indices i, i+1,..., j-1
are sorted, while the remaining entries are unchanged.
For our purposes, the quicksort algorithm is appealing for two reasons. First, it sorts elements in-place. That is, all of the operations can be performed on the original array without duplicating data. (In particular, this is true of step 2). Second, the sub-steps of step 3 of the procedure can be performed in parallel.
For a detailed account of implementing and parallelizing quicksort (and other sorting algorithms), see
- Sorting and Selection in Sequential and Parallel Data Structures and Algorithms (book chapter available here).
Since quicksort follows a divide-and-conquer approach, you may find the following documentation/tutorial helpful:
Note that there are many different variants of quicksort. Exactly which version you use and how it is implemented can have a significant impact on your program’s performance.
Tips
-
For reasonably small arrays,
Arrays.sort()
is likely to be faster than a more complex multi-threaded procedure. You may freely useArrays.sort()
as a sub-routine/base case for your sorting procedure. -
If you choose to implement quicksort, the most costly part of the execution is the “reordering” step (step 2). Even though the procedure can be conceptually simple, there are a lot of opportunities for inefficiency. In particular, for a large array you will want to consider the cache performance and memory access pattern to the input.