Lecture 7: Running Times, Empirical and Formal

Overview

  1. Recap of Last Time
    • Sets
  2. Empirical Running Time
  3. Formal Running Times

Last Time

Introduced SimpleUSet ADT/interface

public interface SimpleUSet<E> {
    int size();
    boolean isEmpty();
    boolean add(E x);
    E remove(E x);
    E find(E x);
}

Considered implementation from a SimpleList

ListSimpleUSet Code

public class ListSimpleUSet<E> implements SimpleUSet<E> {
    private SimpleList<E> list;

    public ListSimpleUSet(SimpleList<E> list) {this.list = list;}

    public boolean add(E x) {

	// check if x is already in set
	for (int i = 0; i < list.size(); i++) {
	    if (x.equals(list.get(i))) {return false;}					
	}

	list.add(list.size(), x);
	return true;	
    }
}

Performance vs List Implementation

Why Such a Big Difference?

    public boolean add(E x) {

	// check if x is already in set
	for (int i = 0; i < list.size(); i++) {
	    if (x.equals(list.get(i))) {return false;}					
	}

	list.add(list.size(), x);
	return true;	
    }

Comparing Get Methods

public class ArraySimpleList<E> implements SimpleList<E> {
    private Object[] contents;
    public E get(int i) {	
	return (E) contents[i];
    }
}
public class LinkedSimpleList<E> implements SimpleList<E> {
    private Node<E> head = null;
    private Node<E> getNode(int i) {	
	Node<E> cur = head;
	for (int j = 0; j < i; ++j) {
	    cur = cur.next;
	}

	return cur;	
    }
    public E get(int i) {	
	return getNode(i).value;
    }	

Today

Making our empirical running time analysis (more) formal

  • How can we predict the (trend of) running times of a program?
  • Under what conditions will our prediction be correct?
  • When do we require empirical investigation?

Efficiency of a procedure vs efficiency of execution

But First

An aside about equality!

Remark on == vs .equals()

Consider the following code:

Integer i = new Integer(10);
Integer j = new Integer(10);

if (i == j) {
    System.out.println("They're the same!");
} else {
    System.out.println("They're not the same!");
}

What does it print?

What about now?

Integer i = new Integer(10);
Integer j = new Integer(10);

if (i.equals(j)) {
    System.out.println("They're the same!");
} else {
    System.out.println("They're not the same!");
}

Question

What is the difference between i == j and i.equals(j)?

Terminology

  • i == j connotes literal equality
    • i and j refer to the same object
  • i.equals(j) connotes semantic equivalence
    • i and j behave in the same way

Usage

When defining a new class, by default == and .equals have the same effect:

  • both are true only when their arguments refer to the same object instance

We can override the default implementation to make equals method do something different

  • E.g., for Integers, i.equals(j) if i and j store the same int value

For SimpleUSet<E>

Must be sure to use equals method, not ==!

SimpleUSet<Integer> set;
set.add(new Integer(2));
set.add(new Integer(2));

Result should be that set contains only one element whose value is 2.

Find and Remove

find and remove return instances stored by the set

E.g., this:

public E find(E x) {
    E y;
    for(int i = 0; i < list.size(); ++i) {
        y = list.get(i);
        if (x.equals(y)) {
            return y;
        }
    return null;
}

Not this:

public E find(E x) {
    E y;
    for(int i = 0; i < list.size(); ++i) {
        y = list.get(i);
        if (x.equals(y)) {
            return x;
        }
    return null;
}

Back to Running Times

Question

What elementary operations can a computer perform?

Computer Architecture: An Oversimplification

Elementary Operations

Computers can:

  1. read from memory to register
  2. write from register to memory
  3. perform basic arithmetic and logical operations
  4. branch (if-then-else)

What Determines Running Time?

  1. Number of elementary operations
  2. Speed of computer clock
  3. Time (clock cycles) per operation

Which Can Programmer Control?

  1. Number of elementary operations
  2. Speed of computer clock
  3. Time (clock cycles) per operation

Our Goal

Understand trends in running times from analyzing algorithms/programs

  • specific values (running times) may differ from computer to computer
  • trends are robust

How does performance scale with instance size?

Our Assumptions

Following operations require a constant number of CPU cycles

  1. reading and writing primitive data types
    • int, long, float, double, char, boolean, references
  2. performing arithmetic/logical operations on primitive data types

  3. making method calls

The following operations scale linearly with instance size

  1. initializing arrays and strings

  2. creating new object instances

Question

With these assumptions (and only these assumptions), how can we measure the quality of a procedure?

Asymptotic Analysis

Idea

Consider running time $\sim$ # elementary operations

  • cannot know the cost of each individual operation
  • trend (running time vs instance size) should not depend on cost of individual operations
  • measure of running time that:
    • ignores constant factors
    • ignores lower order terms

Big O Notation

$f, g$ are functions from natural numbers $\mathbf{N}$ to reals $\mathbf{R}$

E.g.,

  • $n$ = input size
  • $f(n)$ = worst case running time on a particular computer or number of elementary operations

Informally Write $f = O(g)$ to mean “$f$ scales no faster than $g$”

  • much weaker than $f \leq g$ because scaling ignores constants

Big O, Formally

Definition. $f, g : \mathbf{N} \to \mathbf{R}^+$ We write $f = O(g)$ (read: “$f$ is (big) O of $g$”) if there exists a natural number $N$ and a constant $C$ in $\mathbf{R}^+$ such that for all $n \geq N$, we have

$$f(n) \leq C \cdot g(n)$$

Example 1: List Add at Front

Example 2: List Add at Random

Properties of $O$

  1. if $f(n) \leq c$ for all $n$ ($c$ constant), then $f = O(1)$

  2. if $f(n) \leq g(n)$ for all $n$, then $f = O(g)$

  3. if $f = O(g)$, then for all constants $c$, $c f = O(g)$

  4. if $f, h = O(g)$ then $f + h = O(g)$

  5. if $f_1 = O(g_1)$ and $f_2 = O(g_2)$, then $f_1 \cdot f_2 = O(g_1 \cdot g_2)$

Conesequnce:

  • if $a \leq b$, then $n^a = O(n^b)$

Example

Show: $10 n^2 + 100 n + 1000 = O(n^2)$

Running Time Analysis

Assumptions

  1. primitive operations take time $O(1)$

  2. initializing objects of size $n$ (primitive data types) takes time $O(n)$

What is Running Time of add(i, x)?

    public void add(int i, E x) {	
	if (i < 0 || i > size) { throw new IndexOutOfBoundsException();}
	Node<E> nd = new Node<E>();
	nd.value = x;

	if (i == 0) {
	    nd.next = this.head;
	    this.head = nd;
	} else {
	    Node<E> pred = getNode(i - 1);
	    Node<E> succ = pred.next;
	    pred.next = nd;
	    nd.next = succ;
	}

	++size;
    }

Running Time of getNode(i)?

    private Node<E> getNode(int i) {
	// check if i is a valid index
	if (i < 0 || i >= size) return null;
	
	Node<E> cur = head;

	// find the i-th successor of the head
	for (int j = 0; j < i; ++j) {
	    cur = cur.next;
	}

	return cur;	
    }

What About ArraySimpleList?

    public void add(int i, E x) {

	if (i > size || i < 0) {throw new IndexOutOfBoundsException();}
	if (size == capacity) {
	    increaseCapacity();
	}
	++size;
	Object cur = x;
	for (int j = i; j < size; ++j) {
	    Object next = contents[j];
	    contents[j] = cur;
	    cur = next;
	}
    }

increaseCapacity()?

    private void increaseCapacity() {
	
    	// create a new array with larger capacity
    	Object[] bigContents = new Object[2 * capacity];

    	// copy contents to bigContents
    	for (int i = 0; i < capacity; ++i) {
    	    bigContents[i] = contents[i];
    	}

    	// set contents to refer to the new array
    	contents = bigContents;

    	// update this.capacity accordingly
    	capacity = 2 * capacity;
    }

Puzzle

What is the running time of

    static void buildList(SimpleList<Integer> list, int size) {
	for (int i = 0; i < size; i++) {
	    list.add(i, list.size());
	}
    }

when list is an ArraySimpleList?

Does the analysis look right?