Midterm I Guide
Your guide to the first midterm
List of Topics
Below are a list of topics that Midterm I will cover. The exam will include the following cheat sheet:
In particular, you do not need to memorize anything on the cheat sheet, although you should understand the material on the cheat sheet.
Pseudocode
- Write precise pseudocode to describe algorithms.
- Simulate (given) pseudocode by hand to determine behavior/output of a procedure.
Relevant reading:
Induction
- Understand statement of induction.
- Recall structure of a proof by induction (base case and inductive step).
- Apply induction to establish correctness of iterative procedures (loop invariants).
- Apply induction to establish correctness of recursively defined procedures.
Relevant reading:
- Induction notes
- AI Appendix A (Quick Review of Proofs By Induction)
Asymptotic Analysis
- Understand the meaning of $O$, $\Theta$, and $\Omega$ notation
- Analyze pseudocode to determine $O$ running time of (iterative) procedures.
- Write a recurrence relation for worst-case runnning time of recursively defined procedures.
Relevant Reading:
- Asymptotic Analysis notes
- AI Chapter 2 (Asymptotic Notation)
- KT Section 2.2,4 (Asymptotic Order of Growth and A Survey of Common Running Times)
- AA Section 1.1 (Runtime Complexity)
Divide and Conquer
- Understand the divide and conquer paradigm/strategy and how it can be applied to solve algorithm problems.
- Devise recursive methods to apply the divide and conquer strategy to a given problem.
- Apply the Master Theorem to solve recurrence relation for a divide and conquer solution.
Relevant Reading:
- AI Chapter 3 (Divide-and-Conquer Algorithms)
- KT Chapter 5 (Divide and Conquer)
- AA Chapter 5 (Divide-and-Conquer Algorithms)
Specific Algorithms and Problems
- Sorting
- elementary procedures: InsertionSort, SelectionSort, BubbleSort
- divide and conquer: MergeSort, QuickSort, RadixSort
- Divide and Conquer
- binary search
- Karatsuba multiplication
- profit maximization algorithm
Example Questions
Question 1 (invariant for InsertionSort
). Consider the following InsertionSort
method:
1
2
3
4
5
6
7
8
InsertionSort(a):
for i = 2 to n do
j <- i
while j > 1 and a[j-1] > a[j] do
swap(a, j-1, j)
j <- j-1
endwhile
endfor
Note that after the \(k\)th iteration of the inner while loop, \(j = i - k\). Use induction (on \(k\)) to argue that after the \(k\)th iteration of the inner while loop, \(a[j] = a[i-k]\) is the smallest value in \(a[j..i]\).
Question 2 (asymptotic analysis). Consider the following functions:
\[\begin{align*} f_1(n) &= 5 n \log n + 16 n\\ f_2(n) &= (\log n)^2\\ f_3(n) &= \begin{cases}2n &n \text{ is odd}\\ \sqrt{n} &n \text{is even} \end{cases}\\ f_4(n) &\text{ satisfies the recursion relation } f_4(n) = 7 f_4(n / 2) + 3n^2 \end{align*}\]Which of the functions above is…
- …\(O(n)\)?
- …\(O(n^2)\)?
- …\(O(\log n)\)?
- …\(\Theta(n \log n)\)?
- …\(\Omega(n)\)?
- …\(\Omega(n^2)\)?
- …\(\Omega(\log n)\)?
(List all functions that apply to each condition.)
Question 3 (computing powers). For a number \(x\) and non-negative integer \(n\), we define the \(n\)th power of \(x\), \(x^n\) to be the \(n\)-fold product of \(x\):
\[x^n = \underset{n \text{ times}}{\underbrace{x \cdot x \cdots x}},\]with the convention that \(x^0 = 1\). Below are two methods for computing \(x^n\). The first method, Exp(x, n)
uses a simple recursive method to compute \(x^n\), while the second method DCExp(x, n)
applies a divide and conquer strategy.
1
2
3
4
5
6
Exp(x, n):
if n = 0 then
return 1
endif
return x * Exp(x, n - 1)
1
2
3
4
5
6
7
8
9
10
11
DCExp(x, n):
if n = 0 then
return 1
endif
val <- DCExp(x, n / 2)
if n % 2 = 0 then
return val * val
else
return x * val * val
For the methods above, you may assume that “elementary” arithmetic operations (+, -, *, /, %
) are performed in \(O(1)\) time. Note that the division n/2
in DCExp
is integer division, which always returns an integer value.
-
Use induction on \(n\) to argue that the value returned by
Exp(x, n)
is \(x^n\). -
Use big O notation to describe the running time of
Exp(x, n)
as a function of \(n\). -
Simulate the code for
DCExp
by hand to computeDCExp(3, 5)
. Show your work! -
Write a recurrence relation of the form \(T(n) = a T(n / b) + f(n)\) for the running time of
DCExp(x, n)
. -
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of
DCExp
as a function of \(n\).
Question 4. Suppose you are given access to a database of \(n\) values \(v_1, v_2, \ldots, v_n\), where each value is a number from the range \(1, 2,\ldots, N\). To maintain privacy, you may not access the values \(v_i\) directly. Instead, you may only access the database via queries of the form InRange(i, j)
, which returns the number of values \(v_i\) that satisfy \(i \leq v_i \leq j\). Using this limited access to the database, you wish to find the median of the values. (Recall that the median of \(n\) values is a number \(m\) such that at least half of the values are \(\leq m\) and at least half are \(\geq m\).)
-
Suppose \(m\) is a median for the database. Given a guess \(k\) of the median value, how could you use \(O(1)\) queries of the form
InRange(i, j)
to determine if (1) \(k\) is a median, (2) \(k < m\), or (3) \(k > m\)? -
Using your solution to 1, devise a divide and conquer algorithm that uses \(O(\log N)\)
InRange
queries to find a median of the values in the database.
Solutions
Question 1 (invariant for InsertionSort
). Consider the following InsertionSort
method:
1
2
3
4
5
6
7
8
InsertionSort(a):
for i = 2 to n do
j <- i
while j > 1 and a[j-1] > a[j] do
swap(a, j-1, j)
j <- j-1
endwhile
endfor
Note that after the \(k\)th iteration of the inner while loop, \(j = i - k\). Use induction (on \(k\)) to argue that after the \(k\)th iteration of the inner while loop, \(a[j] = a[i-k]\) is the smallest value in \(a[j..i]\).
Solution. As suggested, we argue by induction on \(k\), the number of iterations of the inner while loop. Specifically, we must show the base case (that the claim holds for the smallest value of \(k\)), and the inductive step (if the claim holds for any value of \(k\), then the claim also holds for \(k + 1\)).
Base case \(k = 1\). Consider the first iteration of the while loop. In particular, since we entered the loop, the condition \(a[j-1] > a[j]\) is satisfied. At line 5, we swap the values, so after the swap we have \(a[j-1] < a[j]\). After decrementing \(j\) in line 6, we get \(a[j] < a[j+1] (= a[i])\). Thus, \(a[j]\) is the smallest element in \(a[j..i]\).
Inductive step. Suppose that after iteration \(k\) (i.e., \(j = i - k\)) \(a[j]\) is the smallest value in \(a[j..i]\). The next iteration (iteration \(k + 1\) of the while loop is only executed if \(a[j-1] > a[j]\). In this case, the swap at line 5 is executed, so that after the swap \(a[j-1] < a[j]\). Since no other entries of \(a\) are modified, after the swap \(a[j-1]\) is the smallest element in \(a[j-1..i]\). When \(j\) is decremented in line 6, this means that \(a[j]\) is the smallest elements in \(a[j..i]\).
Since the base case and the inductive step hold, the claim follows from induction.
Question 2 (asymptotic analysis). Consider the following functions:
\[\begin{align*} f_1(n) &= 5 n \log n + 16 n\\ f_2(n) &= (\log n)^2\\ f_3(n) &= \begin{cases}2n &n \text{ is odd}\\ \sqrt{n} &n \text{is even} \end{cases}\\ f_4(n) &\text{ satisfies the recursion relation } f_4(n) = 7 f_4(n / 2) + 3n^2 \end{align*}\]Which of the functions above is…
- …\(O(n)\)?
- …\(O(n^2)\)?
- …\(O(\log n)\)?
- …\(\Theta(n \log n)\)?
- …\(\Omega(n)\)?
- …\(\Omega(n^2)\)?
- …\(\Omega(\log n)\)?
Solution.
First we apply the Master Theorem to \(f_4(n)\). In this case, we have \(a = 7\), \(b = 2\) and \(f(n) = 3n^2\). We compute the constant \(c = \log_2 7\), which is strictly between 2 and 3. Since \(f(n) = O(n^2)\) and \(2 < \log_2 7\), we are in case 1 of the Master Theorem. This allows us to conclude that \(f_4(n) = O(n^c)\).
Which of the functions above is…
- …\(O(n)\)?: \(f_2, f_3\)
- …\(O(n^2)\)?: \(f_1, f_2, f_3\)
- …\(O(\log n)\)?: none
- …\(\Theta(n \log n)\)?: \(f_1\)
- …\(\Omega(n)\)?: \(f_1, f_4\)
- …\(\Omega(n^2)\)?: \(f_4\)
- …\(\Omega(\log n)\)?: \(f_1, f_2, f_3, f_4\)
Question 3 (computing powers). For a number \(x\) and non-negative integer \(n\), we define the \(n\)th power of \(x\), \(x^n\) to be the \(n\)-fold product of \(x\):
\[x^n = \underset{n \text{ times}}{\underbrace{x \cdot x \cdots x}},\]with the convention that \(x^0 = 1\). Below are two methods for computing \(x^n\). The first method, Exp(x, n)
uses a simple recursive method to compute \(x^n\), while the second method DCExp(x, n)
applies a divide and conquer strategy.
1
2
3
4
5
6
Exp(x, n):
if n = 0 then
return 1
endif
return x * Exp(x, n - 1)
1
2
3
4
5
6
7
8
9
10
11
DCExp(x, n):
if n = 0 then
return 1
endif
val <- DCExp(x, n / 2)
if n % 2 = 0 then
return val * val
else
return x * val * val
For the methods above, you may assume that “elementary” arithmetic operations (+, -, *, /, %
) are performed in \(O(1)\) time. Note that the division n/2
in DCExp
is integer division, which always returns an integer value.
-
Use induction on \(n\) to argue that the value returned by
Exp(x, n)
is \(x^n\). -
Use big O notation to describe the running time of
Exp(x, n)
as a function of \(n\). -
Simulate the code for
DCExp
by hand to computeDCExp(3, 5)
. Show your work! -
Write a recurrence relation of the form \(T(n) = a T(n / b) + f(n)\) for the running time of
DCExp(x, n)
. -
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of
DCExp
as a function of \(n\).
Solution.
-
Use induction on \(n\) to argue that the value returned by
Exp(x, n)
is \(x^n\).Base case. In the base case \(n = 0\), the method returns \(1 = n^0\) at line 3, as desired.
Inductive step. Suppose that for some value of \(n\),
Exp(x, n)
returns \(x^n\). ThenExp(x, n+1)
returns \(x * \mathrm{Exp}(x, n) = x * x^n = x^{n+1}\), as claimed. The first equality holds by the inductive hypothesis.Since the base case and inductive step hold, the claim follows.
-
Use big O notation to describe the running time of
Exp(x, n)
as a function of \(n\).The running time is \(O(n)\) (assuming arithmetic is performed in \(O(1)\) time).
-
Simulate the code for
DCExp
by hand to computeDCExp(3, 5)
. Show your work!DCExp(3, 5)
setsval <- DCExp(3, 2)
at line 6DCExp(3, 2)
setsval <- DCExp(3, 1)
at line 6DCExp(3, 1)
setsval <- DCExp(3, 0)
at line 6DCExp(3, 0)
returns1
at line 3
val <- 1
at line 6DCExp(3, 1)
returns3 * 1 * 1 = 3
at line 11
val <- 3
at line 6DCExp(3, 2)
returns3 * 3 = 9
at line 9
val <- 9
at line 6DCExp(3, 5)
returns3 * 9 * 9 = 243
at line 11
-
Write a recurrence relation of the form \(T(n) = a T(n / b) + f(n)\) for the running time of
\[T(n) = 1 \cdot T(n / 2) + O(1)\]DCExp(x, n)
. -
Apply the Master Theorem to your solution to part 4 to derive a bound on the running time of
DCExp
as a function of \(n\).Since \(c = \log_b a = \log_2 1 = 0\), we have \(f(n) = O(n^0) = O(n^c)\), we we are in case 2 of the Master Theorem. Thus, the theorem gives us a running time of \(O(\log n)\).
Question 4. Suppose you are given access to a database of \(n\) values \(v_1, v_2, \ldots, v_n\), where each value is a number from the range \(1, 2,\ldots, N\). To maintain privacy, you may not access the values \(v_i\) directly. Instead, you may only access the database via queries of the form InRange(i, j)
, which returns the number of values \(v_i\) that satisfy \(i \leq v_i \leq j\). Using this limited access to the database, you wish to find the median of the values. (Recall that the median of \(n\) values is a number \(m\) such that at least half of the values are \(\leq m\) and at least half are \(\geq m\).)
-
Suppose \(m\) is a median for the database. Given a guess \(k\) of the median value, how could you use a single query of the form
InRange(i, j)
to determine if (1) \(k\) is a median, (2) \(k < m\), or (3) \(k > m\)?Solution. Observe that calling
InRange(k, N)
computes the number of elements in the database whose value is at least \(k\) andInRange(0, k)
computes the number of elements whose value is at most \(k\). Thus if \(\mathrm{InRange}(k, N) \geq n / 2\), and \(\mathrm{InRange}(0, k) \geq n / 2\) then \(k\) is a median. Otherwise, if \(\mathrm{InRange}(0, k) < n / 2\), then \(k < m\) for any median \(m\), and if \(\mathrm{InRange}(k, N) < n / 2\), then \(k > m\) for any median \(m\). -
Using your solution to 1, devise a divide and conquer algorithm that uses \(O(\log N)\)
InRange
queries to find a median of the values in the database.Solution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
FindMedian(): retun FindMedian(0, N) # Find a median value between i and j using binary search FindMedian(i, j): n <- InRange(0, N) # total number of elements in database k <- (i + j) / 2 left <- InRange(0, k) right <- InRange(k, N) if left >= n / 2 and right >= n / 2 then return k else if right < n / 2 then return FindMedian(i, k) else return FindMedian(k, j)