Lecture 06: Sorting by Divide and Conquer II

COSC 311 Algorithms, Fall 2022

$ \def\compare{ {\mathrm{compare}} } \def\swap{ {\mathrm{swap}} } \def\sort{ {\mathrm{sort}} } \def\insert{ {\mathrm{insert}} } \def\true{ {\mathrm{true}} } \def\false{ {\mathrm{false}} } \def\BubbleSort{ {\mathrm{BubbleSort}} } \def\SelectionSort{ {\mathrm{SelectionSort}} } \def\Merge{ {\mathrm{Merge}} } \def\MergeSort{ {\mathrm{MergeSort}} } $

Announcement

Accountability Groups

Overview

  1. MergeSort
  2. Running Time of Merge Sort
  3. QuickSort

Previously

  • Sorting meets Divide & Conquer

  • MergeSort: Divide by Index

    1. divide $a$ into halves $m = (n+1) / 2$
    2. sort $a[1..m-1]$ recursively
    3. sort $a[m..n]$ recursively
    4. merge $a[1..m-1]$ and $a[m..n]$ to form sorted array

Pseudocode

# sort values of a between indices i and j-1
MergeSort(a, i, j):
  if j - i = 1 then
    return
  endif
  m <- (i + j) / 2
  MergeSort(a,i,m)
  MergeSort(a,m,j)
  Merge(a,i,m,j)

Illustration of MergeSort

Correctness of MergeSort

Establish two claims:

Claim 1 (merge). If $a[i..m-1]$ and $a[m..j]$ are sorted, then after $\Merge(a, i, m, j)$, $a[i..j]$ is sorted.

  • Argued on lecture ticket!

Claim 2. For any indices $i < j$, after calling $\MergeSort(a, i, j)$, $a[i..j]$ is sorted.

  • Argue by Induction!

Pseudocode Again

00  # sort values of a between indices i and j-1
01  MergeSort(a, i, j):
02    if j - i = 1 then
03      return
04    endif
05    m <- (i + j) / 2
06    MergeSort(a,i,m)
07    MergeSort(a,m,j)
08    Merge(a,i,m,j)

Inductive Claim

Consider $\MergeSort(a, i, j)$, define $k = j - i$ to be size

$P(k)$: for every $k’ \leq k$, $\MergeSort(a, i, j)$ with size $k’$ succeeds

Base case $k = 1$:

Inductive step $P(k) \implies P(k+1)$:

Question

How efficient is MergeSort?

00  # sort values of a between indices i and j-1
01  MergeSort(a, i, j):
02    if j - i = 1 then
03      return
04    endif
05    m <- (i + j) / 2
06    MergeSort(a,i,m)
07    MergeSort(a,m,j)
08    Merge(a,i,m,j)

Analyzing Running Time

00  # sort values of a between indices i and j-1
01  MergeSort(a, i, j):
02    if j - i = 1 then
03      return
04    endif
05    m <- (i + j) / 2
06    MergeSort(a,i,m)
07    MergeSort(a,m,j)
08    Merge(a,i,m,j)

Observation 1. Let $k = j - i$ be the size of the method call $\MergeSort(a, i, j)$. Then running time is $O(k) + $ running time of recursive calls on lines 6-7.

Observation 2. Recursive calls have size $k / 2$.

  • Assume size is power of 2

Combining Observations

Recall Logarithms (base 2)

Define $\log$ by

  • $\log a = b \iff 2^b = a$

Another way

  • $\log a$ is # times $a$ can be divided by $2$ to get (at most) $1$.

Facts.

  1. For every constant $c > 0$, $\log n = O(n^c)$.
  2. $\log n \neq O(1)$.

A Final Calculation

  • Running time $T(n)$ of $\MergeSort(a, 1, n+1)$:

    \[\begin{align*} T(n) &= 2 T(n/2) + O(n)\\ &= 4 T(\frac n 4) + 2 O(\frac n 2) + O(n)\\ &= 8 T(\frac n 8) + 4 O(\frac n 4) + 2 O(\frac n 2) + O(n)\\ &\vdots\\ &= n T(1) + \frac n 2 O(2) + \cdots + 8 O(\frac n 8) + 4 O(\frac n 4) + 2 O(\frac n 2) + O(n)\\ &= O(n) + O(n) + \cdots + O(n) + O(n) + O(n)\\ \end{align*}\]

Picture so Far:

SelectionSort. $O(n^2)$ operations

  • $O(n^2)$ comparisons
  • $O(n)$ swaps

BubbleSort and InsertionSort. $O(n^2)$ operations

  • $O(n^2)$ comparisons
  • $O(n^2)$ swaps

MergeSort. $O(n \log n)$ operations

  • $O(n \log n)$ comparisons
  • $O(n \log n)$ modifications
  • uses $O(n)$ space overhead

QuickSort: Another D&C Sort

Idea. Divide array $a$ by value

  • choose a value $p$ from $a$, the pivot
  • arrage values of $a$ such that:
    • $p$ is at index $k$
    • values $\leq p$ are at indices $i \leq k$
    • values $> p$ are at indices $j > k$
  • recursively sort indices $i < k$
  • recursively sort indice $j > k$

QuickSort Illustration

QuickSort Pseudocode

QuickSort(a, i, j):
  if j - i <= 1 then
    return
  endif
  p <- GetPivot(a, i, j) # select a pivot
  k <- Split(a, i, j, p)
  QuickSort(a, i, k-1)
  QuickSort(a, k+1, j)

What Is Split Running Time?

What is Worst-Case QS Running Time?

How To Select Pivot?

Random Pivot Selection

GetPivot(a, i, j):
  k <- RandomInt(i, j)
  return a[k]

A Heuristic

What is a “good” pivot choice?

How likely is a good pivot to be chosen?

Careful Analysis

Can show. If random pivot is chosen, then on average QuickSort uses $O(n \log n)$ operations

Next Time

  • Lower bounds for sorting
  • RadixSort