Lecture 09: Amortized Analysis and Sorted Sets

Announcements

Midterm Exam due Friday at 5:00
OH on Zoom today

Overview

  1. Amortized Analysis
  2. Sorted Sets

Last Time

Considered ArraySimpleStack

public class ArraySimpleStack<E> implements SimpleStack<E> {
    private int capacity;
    private int size = 0;
    private Object[] contents;
	
	...

    public void push(E x) {
	if (size == capacity) {
	    increaseCapacity();
	}

	contents[size] = x;
	++size;
    }

increaseCapacity()

    private void increaseCapacity() {
	
    	// create a new array with larger capacity
    	Object[] bigContents = new Object[2 * capacity];

    	// copy contents to bigContents
    	for (int i = 0; i < capacity; ++i) {
    	    bigContents[i] = contents[i];
    	}

    	// set contents to refer to the new array
    	contents = bigContents;

    	// update this.capacity accordingly
    	capacity = 2 * capacity;
    }

Puzzle

What is the running time of

    SimpleStack<Integer> stk = new ArraySimpleStack<Integer>();
	for (int i = 1; i <= n; i++) {
	    stk.push(i);
	}

Time per push

Assessment

    SimpleStack<Integer> stk = new ArraySimpleStack<Integer>();
	for (int i = 1; i <= n; i++) {
	    stk.push(i);
	}
  • Worst case running time of push is $O(n)$
    • $n$ pushes have running time $n \cdot O(n) = O(n^2)$
  • But vast majority of calls to push are performed in $O(1)$ time
  • When we call push $n$ times empirical running time looks like $O(n)$, not $O(n^2)$.

Amortized Analysis

Convention

“cost” of an operation $\approx$ running time of operation

A More Refined Analysis

Idea. Don’t look at worst-case cost of each operation individually

  • instead look at worst-case cost of any sequence of operations

  • amortized cost is the average cost per operation of any such sequence

Amortized Analysis, IRL

Cost of living:

  • Rent = $1,800 on first of month
  • Groceries = $100 each week
  • Lunch: $5

Income:

  • I get paid daily $100 (tax free)

Question:

  • Some days, I need to pay $1,905… how can I afford to live on 100 a day?!?

Banker’s View

Open a bank account!

Each day, can do:

  1. pay expense out of pocket (from day’s pay)
  2. deposit money into bank account
  3. withdraw money from bank account to pay expense

I can afford to live off $100 a day if every day I can pay that day’s expenses and maintain non-negative bank account balance

$\implies$ amortized cost of living is (at most) $100 / day

Example push(x)

Assume initially size = capacity = 1

	for (int i = 1; i <= n; i++) {
	    stk.push(i);
	}

What are costs of operations?

Banker’s View of Amortized Analysis

  • each operation $\mathrm{op}$ has associated cost, $\mathrm{cost}(\mathrm{op})$
  • have an account $A$ with balance $\mathrm{bal}(A)$
    • must maintain $\mathrm{bal}(A) \geq 0$
  • amortized cost of $\mathrm{op}$ is:
$$\mathrm{ac}(\mathrm{op}) = \mathrm{cost}(\mathrm{op}) + \mathrm{bal}(A') - \mathrm{bal}(A)$$
  • $A$ = account before $\mathrm{op}$, $A’$ = account after

Analysis of push

    public void push(E x) {
	if (size == capacity) {
	    increaseCapacity();
	}

	contents[size] = x;
	++size;
    }
  • cost of increaseCapacity when size $= n$ is $C_n$
    • new capacity is $2 n$

What is $C_n$ in big O notation?

increaseCapacity() Code

    private void increaseCapacity() {
	
    	// create a new array with larger capacity
    	Object[] bigContents = new Object[2 * capacity];

    	// copy contents to bigContents
    	for (int i = 0; i < capacity; ++i) {
    	    bigContents[i] = contents[i];
    	}

    	// set contents to refer to the new array
    	contents = bigContents;

    	// update this.capacity accordingly
    	capacity = 2 * capacity;
    }

Accounting for push

    public void push(E x) {
	if (size == capacity) {
	    increaseCapacity();
	}

	contents[size] = x;
	++size;
    }

Suppose current capacity is $n$

  • last resize at $n / 2$

Each push until next resize

  1. pay $\mathrm{cost}(\texttt{push})$
  2. add money to account

Question. How much to add?

Question

At next increaseCapacity() call, what is account balance?

How to pay $C_n$ for increaseCapacity()?

Final Analysis

If $n/2$ was last resize, each push until size is $n$:

  1. pay cost of push
  2. add $C_n / (n/2) = 2 C_n / n$ to account

On push when size is $n$

  1. pay cost of push
  2. remove $C_n$ from $A$ to pay for increaseCapacity()

In both scenarios

$$\mathrm{ac}(\mathrm{op}) = \mathrm{cost}(\mathrm{op}) + \mathrm{bal}(A') - \mathrm{bal}(A) = O(1)$$

So We Should Have Expected

More Generally

Amortized complexity is a measure of average cost of operations when averaged over any sequence of operations

Moral. Even if individual operations can be expensive, if expensive operations are infrequent, then a data structure may still be efficient.

Sorted Sets

Thought Experiment

  • Oxford English Dictionary (OED)
    • contains 300,000 entries
    • 20 volumes, 21,000+ pages
    • includes history and earliest known usage of words
  • Complete Works of Shakespeare
    • 1 volume
    • 1,300 pages

Question. I read that Shakespeare coined the term indistinguishable. Would it be faster to search OED or Shakespeare to see if Shakespeare used indistinguishable?

Another Question

What makes searching a dictionary preferable to search a novel for a word?

Previously

You’ve considered 2 set implementations

  1. LinkedSimpleUSet
    • elements stored in order first read
  2. MTFSimpleUSet
    • elements stored in order last accessed

Time to access an element is proportional to its position in list

  • this is the best we can do for a linked list

Faster Finding

In dictionary example:

  • words are sorted alphabetically
  • to search, start at middle of book
  • because of sorting, know to jump forward or backward

Data Structure?

A set that stores elements in sorted order

  • assume elements can be sorted in some way

What data structure allows us to “jump” as in searching a dictionary?

Formalizing an ADT

Sorted Set ADT

  • all of the functionality of a SimpleUSet
    • add, remove, find
  • assumes elements can be compared
    • for any two elements $x, y$, have $x = y$, $x > y$, or $x < y$
  • additional methods
    • findMin
    • findMax
  • slightly different behavior
    • find(x) returns the smallest element y in the set that is no larger than x (null if no such y)

The Comparable Interface

To indicate that elements of class E can be compared, E must implement the Comparable<E> interface:

public interface Comparable<T> {
    int compareTo(T o);
}

This interface is built in to Java!

Interpretation:

  • x.compareTo(y) < 0 indicates that x is “smaller than” y
  • x.compareTo(y) > 0 indicates that x is “larger than” y
  • x.compareTo(y) == 0 indicates that x is semantically equivalent to y
    • should have x.compareTo(y) == 0 if and only if x.equals(y)

Example Integer

public class Integer implements Comparable<Integer> {
    private int value;

    @Override
    int compareTo(Integer x) {
        return value - x.value;
    }
	
    @Override
    boolean equals(Object o) {
        if (!(o instanceOf Integer)) return false;
        Integer x = (Integer) o;		
        return (value == x.value);
    }	
}

Sorted Set Interface

public interface SimpleSSet<E extends Comparable<E>> extends SimpleUSet<E>  {

    @Override /* comments explain how find differs from parent method */
    E find(E x);


    E findMin();

    
    E findMax();
}

Implementing SimpleSSet

Question. What data structure should we use to store elements?

How to…

add(x)?

remove(x)?

find(x)?

Common Functionality

Find the index where x would be located

  • unambiguous because set is sorted, elements are unique

Define int getIndex(x) method

How to find(x)?

How to add(x)?

Observation

Once getIndex is implemented add, remove, find can be made to work

  • alternative (correct) implementations of getIndex will not affect add/remove/find code

Suggestion. First implement/test with simple getIndex method, then design/test more sophisticated getIndex implementations

Maxim. Premature optimization is the root of all evil.

  • Tony Hoare (popularized by Donald Knuth)

ArraySimpleSSet Implementation

See code!

More Efficient getIndex

How to search a sorted array like a dictionary?

Recursive getIndex()

  • See: ArraySimpleSSet.java

USet and SSet Find Running Times

Compare running times of find between unordered set implementation (ArraySimpleUSet) and ArraySimpleSSet with binary search

  • With linear search, implementations have almost exact same running time because elements were added in sorted order!

Time to Find: USet vs SSet

Time to Find SSet

Time to Find SSet (Long)

Next Time

  • More discussion of recursion
  • More detailed analysis of binary search