Will Rosenbaum | Midterm I Guide

List of Topics

Below are a list of topics that Midterm I will cover. The exam will include the following cheat sheet:

Midterm I Cheat Sheet

In particular, you do not need to memorize anything on the cheat sheet, although you should understand the material on the cheat sheet.

Pseudocode

Write precise pseudocode to describe algorithms.
Simulate (given) pseudocode by hand to determine behavior/output of a procedure.

Relevant reading:

Pseudocode notes

Induction

Understand statement of induction.
Recall structure of a proof by induction (base case and inductive step).
Apply induction to establish correctness of iterative procedures (loop invariants).
Apply induction to establish correctness of recursively defined procedures.

Relevant reading:

Induction notes
AI Appendix A (Quick Review of Proofs By Induction)

Asymptotic Analysis

Understand the meaning of $O$, $\Theta$, and $\Omega$ notation
Analyze pseudocode to determine $O$ running time of (iterative) procedures.
Write a recurrence relation for worst-case runnning time of recursively defined procedures.

Relevant Reading:

Asymptotic Analysis notes
AI Chapter 2 (Asymptotic Notation)
KT Section 2.2,4 (Asymptotic Order of Growth and A Survey of Common Running Times)
AA Section 1.1 (Runtime Complexity)

Divide and Conquer

Understand the divide and conquer paradigm/strategy and how it can be applied to solve algorithm problems.
Devise recursive methods to apply the divide and conquer strategy to a given problem.
Apply the Master Theorem to solve recurrence relation for a divide and conquer solution.

Relevant Reading:

AI Chapter 3 (Divide-and-Conquer Algorithms)
KT Chapter 5 (Divide and Conquer)
AA Chapter 5 (Divide-and-Conquer Algorithms)

Specific Algorithms and Problems

Sorting
- elementary procedures: InsertionSort, SelectionSort, BubbleSort
- divide and conquer: MergeSort, QuickSort, RadixSort
Divide and Conquer
- binary search
- Karatsuba multiplication
- profit maximization algorithm

Example Questions

Question 1 (invariant for InsertionSort). Consider the following InsertionSort method:

  InsertionSort(a):
    for i = 2 to n do
      j <- i
      while j > 1 and a[j-1] > a[j] do
        swap(a, j-1, j)
        j <- j-1
      endwhile
    endfor

Note that after the $k$th iteration of the inner while loop, $j = i - k$. Use induction (on $k$) to argue that after the $k$th iteration of the inner while loop, $a[j] = a[i-k]$ is the smallest value in $a[j..i]$.

Question 2 (asymptotic analysis). Consider the following functions:

\[\begin{align*} f_1(n) &= 5 n \log n + 16 n\\ f_2(n) &= (\log n)^2\\ f_3(n) &= \begin{cases}2n &n \text{ is odd}\\ \sqrt{n} &n \text{is even} \end{cases}\\ f_4(n) &\text{ satisfies the recursion relation } f_4(n) = 7 f_4(n / 2) + 3n^2 \end{align*}\]

Which of the functions above is…

…$O(n)$?
…$O(n^2)$?
…$O(\log n)$?
…$\Theta(n \log n)$?
…$\Omega(n)$?
…$\Omega(n^2)$?
…$\Omega(\log n)$?

(List all functions that apply to each condition.)

Question 3 (computing powers). For a number $x$ and non-negative integer $n$, we define the $n$th power of $x$, $x^n$ to be the $n$-fold product of $x$:

\[x^n = \underset{n \text{ times}}{\underbrace{x \cdot x \cdots x}},\]

with the convention that $x^0 = 1$. Below are two methods for computing $x^n$. The first method, Exp(x, n) uses a simple recursive method to compute $x^n$, while the second method DCExp(x, n) applies a divide and conquer strategy.

  Exp(x, n):
    if n = 0 then
      return 1
    endif
    
    return x * Exp(x, n - 1)

  DCExp(x, n):
    if n = 0 then
      return 1
    endif
    
    val <- DCExp(x, n / 2) 
    
    if n % 2 = 0 then
      return val * val
    else
      return x * val * val

For the methods above, you may assume that “elementary” arithmetic operations (+, -, *, /, %) are performed in $O(1)$ time. Note that the division n/2 in DCExp is integer division, which always returns an integer value.

Use induction on $n$ to argue that the value returned by Exp(x, n) is $x^n$.
Use big O notation to describe the running time of Exp(x, n) as a function of $n$.
Simulate the code for DCExp by hand to compute DCExp(3, 5). Show your work!
Write a recurrence relation of the form $T(n) = a T(n / b) + f(n)$ for the running time of DCExp(x, n).
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of DCExp as a function of $n$.

Question 4. Suppose you are given access to a database of $n$ values $v_1, v_2, \ldots, v_n$, where each value is a number from the range $1, 2,\ldots, N$. To maintain privacy, you may not access the values $v_i$ directly. Instead, you may only access the database via queries of the form InRange(i, j), which returns the number of values $v_i$ that satisfy $i \leq v_i \leq j$. Using this limited access to the database, you wish to find the median of the values. (Recall that the median of $n$ values is a number $m$ such that at least half of the values are $\leq m$ and at least half are $\geq m$.)

Suppose $m$ is a median for the database. Given a guess $k$ of the median value, how could you use $O(1)$ queries of the form InRange(i, j) to determine if (1) $k$ is a median, (2) $k < m$, or (3) $k > m$?
Using your solution to 1, devise a divide and conquer algorithm that uses $O(\log N)$ InRange queries to find a median of the values in the database.

Solutions

Question 1 (invariant for InsertionSort). Consider the following InsertionSort method:

  InsertionSort(a):
    for i = 2 to n do
      j <- i
      while j > 1 and a[j-1] > a[j] do
        swap(a, j-1, j)
        j <- j-1
      endwhile
    endfor

Solution. As suggested, we argue by induction on $k$, the number of iterations of the inner while loop. Specifically, we must show the base case (that the claim holds for the smallest value of $k$), and the inductive step (if the claim holds for any value of $k$, then the claim also holds for $k + 1$).

Base case $k = 1$. Consider the first iteration of the while loop. In particular, since we entered the loop, the condition $a[j-1] > a[j]$ is satisfied. At line 5, we swap the values, so after the swap we have $a[j-1] < a[j]$. After decrementing $j$ in line 6, we get $a[j] < a[j+1] (= a[i])$. Thus, $a[j]$ is the smallest element in $a[j..i]$.

Inductive step. Suppose that after iteration $k$ (i.e., $j = i - k$) $a[j]$ is the smallest value in $a[j..i]$. The next iteration (iteration $k + 1$ of the while loop is only executed if $a[j-1] > a[j]$. In this case, the swap at line 5 is executed, so that after the swap $a[j-1] < a[j]$. Since no other entries of $a$ are modified, after the swap $a[j-1]$ is the smallest element in $a[j-1..i]$. When $j$ is decremented in line 6, this means that $a[j]$ is the smallest elements in $a[j..i]$.

Since the base case and the inductive step hold, the claim follows from induction.

Question 2 (asymptotic analysis). Consider the following functions:

Which of the functions above is…

…$O(n)$?
…$O(n^2)$?
…$O(\log n)$?
…$\Theta(n \log n)$?
…$\Omega(n)$?
…$\Omega(n^2)$?
…$\Omega(\log n)$?

Solution.

First we apply the Master Theorem to $f_4(n)$. In this case, we have $a = 7$, $b = 2$ and $f(n) = 3n^2$. We compute the constant $c = \log_2 7$, which is strictly between 2 and 3. Since $f(n) = O(n^2)$ and $2 < \log_2 7$, we are in case 1 of the Master Theorem. This allows us to conclude that $f_4(n) = O(n^c)$.

Which of the functions above is…

…$O(n)$?: $f_2, f_3$
…$O(n^2)$?: $f_1, f_2, f_3$
…$O(\log n)$?: none
…$\Theta(n \log n)$?: $f_1$
…$\Omega(n)$?: $f_1, f_4$
…$\Omega(n^2)$?: $f_4$
…$\Omega(\log n)$?: $f_1, f_2, f_3, f_4$

Question 3 (computing powers). For a number $x$ and non-negative integer $n$, we define the $n$th power of $x$, $x^n$ to be the $n$-fold product of $x$:

\[x^n = \underset{n \text{ times}}{\underbrace{x \cdot x \cdots x}},\]

  Exp(x, n):
    if n = 0 then
      return 1
    endif
    
    return x * Exp(x, n - 1)

  DCExp(x, n):
    if n = 0 then
      return 1
    endif
    
    val <- DCExp(x, n / 2) 
    
    if n % 2 = 0 then
      return val * val
    else
      return x * val * val

Use induction on $n$ to argue that the value returned by Exp(x, n) is $x^n$.
Use big O notation to describe the running time of Exp(x, n) as a function of $n$.
Simulate the code for DCExp by hand to compute DCExp(3, 5). Show your work!
Write a recurrence relation of the form $T(n) = a T(n / b) + f(n)$ for the running time of DCExp(x, n).
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of DCExp as a function of $n$.

Solution.

Use induction on $n$ to argue that the value returned by Exp(x, n) is $x^n$.

Base case. In the base case $n = 0$, the method returns $1 = n^0$ at line 3, as desired.

Inductive step. Suppose that for some value of $n$, Exp(x, n) returns $x^n$. Then Exp(x, n+1) returns $x * \mathrm{Exp}(x, n) = x * x^n = x^{n+1}$, as claimed. The first equality holds by the inductive hypothesis.

Since the base case and inductive step hold, the claim follows.
Use big O notation to describe the running time of Exp(x, n) as a function of $n$.

The running time is $O(n)$ (assuming arithmetic is performed in $O(1)$ time).
Simulate the code for DCExp by hand to compute DCExp(3, 5). Show your work!
- DCExp(3, 5) sets val <- DCExp(3, 2) at line 6
  - DCExp(3, 2) sets val <- DCExp(3, 1) at line 6
    - DCExp(3, 1) sets val <- DCExp(3, 0) at line 6
      - DCExp(3, 0) returns 1 at line 3
    - val <- 1 at line 6
    - DCExp(3, 1) returns 3 * 1 * 1 = 3 at line 11
  - val <- 3 at line 6
  - DCExp(3, 2) returns 3 * 3 = 9 at line 9
- val <- 9 at line 6
- DCExp(3, 5) returns 3 * 9 * 9 = 243 at line 11
Write a recurrence relation of the form $T(n) = a T(n / b) + f(n)$ for the running time of DCExp(x, n).
\[T(n) = 1 \cdot T(n / 2) + O(1)\]
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of DCExp as a function of $n$.

Since $c = \log_b a = \log_2 1 = 0$, we have $f(n) = O(n^0) = O(n^c)$, we we are in case 2 of the Master Theorem. Thus, the theorem gives us a running time of $O(\log n)$.

Suppose $m$ is a median for the database. Given a guess $k$ of the median value, how could you use a single query of the form InRange(i, j) to determine if (1) $k$ is a median, (2) $k < m$, or (3) $k > m$?

Solution. Observe that calling InRange(k, N) computes the number of elements in the database whose value is at least $k$ and InRange(0, k) computes the number of elements whose value is at most $k$. Thus if $\mathrm{InRange}(k, N) \geq n / 2$, and $\mathrm{InRange}(0, k) \geq n / 2$ then $k$ is a median. Otherwise, if $\mathrm{InRange}(0, k) < n / 2$, then $k < m$ for any median $m$, and if $\mathrm{InRange}(k, N) < n / 2$, then $k > m$ for any median $m$.

Using your solution to 1, devise a divide and conquer algorithm that uses $O(\log N)$ InRange queries to find a median of the values in the database.

Solution.

   FindMedian():
     retun FindMedian(0, N)
		
   # Find a median value between i and j using binary search
   FindMedian(i, j):
      n <- InRange(0, N) # total number of elements in database
      k <- (i + j) / 2
      left <- InRange(0, k)
      right <- InRange(k, N)
      if left >= n / 2 and right >= n / 2 then
         return k
      else if right < n / 2 then
         return FindMedian(i, k)
      else
         return FindMedian(k, j)