Lecture 7: Running Times, Empirical and Formal

Overview

Recap of Last Time
- Sets
Empirical Running Time
Formal Running Times

Last Time

Introduced SimpleUSet ADT/interface

public interface SimpleUSet<E> {
    int size();
    boolean isEmpty();
    boolean add(E x);
    E remove(E x);
    E find(E x);
}

Considered implementation from a SimpleList

ListSimpleUSet.java

`ListSimpleUSet` Code

public class ListSimpleUSet<E> implements SimpleUSet<E> {
    private SimpleList<E> list;

    public ListSimpleUSet(SimpleList<E> list) {this.list = list;}

    public boolean add(E x) {

	// check if x is already in set
	for (int i = 0; i < list.size(); i++) {
	    if (x.equals(list.get(i))) {return false;}					
	}

	list.add(list.size(), x);
	return true;	
    }
}

Performance vs List Implementation

Why Such a Big Difference?

    public boolean add(E x) {

	// check if x is already in set
	for (int i = 0; i < list.size(); i++) {
	    if (x.equals(list.get(i))) {return false;}					
	}

	list.add(list.size(), x);
	return true;	
    }

Comparing `Get` Methods

public class ArraySimpleList<E> implements SimpleList<E> {
    private Object[] contents;
    public E get(int i) {	
	return (E) contents[i];
    }
}

public class LinkedSimpleList<E> implements SimpleList<E> {
    private Node<E> head = null;
    private Node<E> getNode(int i) {	
	Node<E> cur = head;
	for (int j = 0; j < i; ++j) {
	    cur = cur.next;
	}

	return cur;	
    }
    public E get(int i) {	
	return getNode(i).value;
    }

Today

Making our empirical running time analysis (more) formal

How can we predict the (trend of) running times of a program?
Under what conditions will our prediction be correct?
When do we require empirical investigation?

Efficiency of a procedure vs efficiency of execution

But First

An aside about equality!

Remark on `==` vs `.equals()`

Consider the following code:

Integer i = new Integer(10);
Integer j = new Integer(10);

if (i == j) {
    System.out.println("They're the same!");
} else {
    System.out.println("They're not the same!");
}

What does it print?

What about now?

Integer i = new Integer(10);
Integer j = new Integer(10);

if (i.equals(j)) {
    System.out.println("They're the same!");
} else {
    System.out.println("They're not the same!");
}

Question

What is the difference between i == j and i.equals(j)?

Terminology

i == j connotes literal equality
- i and j refer to the same object

i.equals(j) connotes semantic equivalence
- i and j behave in the same way

Usage

When defining a new class, by default == and .equals have the same effect:

both are true only when their arguments refer to the same object instance

We can override the default implementation to make equals method do something different

E.g., for Integers, i.equals(j) if i and j store the same int value

For `SimpleUSet<E>`

Must be sure to use equals method, not ==!

SimpleUSet<Integer> set;
set.add(new Integer(2));
set.add(new Integer(2));

Result should be that set contains only one element whose value is 2.

Find and Remove

find and remove return instances stored by the set

E.g., this:

public E find(E x) {
    E y;
    for(int i = 0; i < list.size(); ++i) {
        y = list.get(i);
        if (x.equals(y)) {
            return y;
        }
    return null;
}

Not this:

public E find(E x) {
    E y;
    for(int i = 0; i < list.size(); ++i) {
        y = list.get(i);
        if (x.equals(y)) {
            return x;
        }
    return null;
}

Back to Running Times

Question

What elementary operations can a computer perform?

Computer Architecture: An Oversimplification

Elementary Operations

Computers can:

read from memory to register
write from register to memory
perform basic arithmetic and logical operations
branch (if-then-else)

What Determines Running Time?

Number of elementary operations
Speed of computer clock
Time (clock cycles) per operation

Which Can Programmer Control?

Number of elementary operations
Speed of computer clock
Time (clock cycles) per operation

Our Goal

Understand trends in running times from analyzing algorithms/programs

specific values (running times) may differ from computer to computer
trends are robust

How does performance scale with instance size?

Our Assumptions

Following operations require a constant number of CPU cycles

reading and writing primitive data types
- int, long, float, double, char, boolean, references
performing arithmetic/logical operations on primitive data types
making method calls

The following operations scale linearly with instance size

initializing arrays and strings
creating new object instances

Question

With these assumptions (and only these assumptions), how can we measure the quality of a procedure?

Asymptotic Analysis

Idea

Consider running time $\sim$ # elementary operations

cannot know the cost of each individual operation
trend (running time vs instance size) should not depend on cost of individual operations
measure of running time that:
- ignores constant factors
- ignores lower order terms

Big O Notation

$f, g$ are functions from natural numbers $\mathbf{N}$ to reals $\mathbf{R}$

E.g.,

$n$ = input size
$f(n)$ = worst case running time on a particular computer or number of elementary operations

Informally Write $f = O(g)$ to mean “$f$ scales no faster than $g$”

much weaker than $f \leq g$ because scaling ignores constants

Big O, Formally

Definition. $f, g : \mathbf{N} \to \mathbf{R}^+$ We write $f = O(g)$ (read: “$f$ is (big) O of $g$”) if there exists a natural number $N$ and a constant $C$ in $\mathbf{R}^+$ such that for all $n \geq N$, we have

$$f(n) \leq C \cdot g(n)$$

Example 1: List Add at Front

Example 2: List Add at Random

Properties of $O$

if $f(n) \leq c$ for all $n$ ($c$ constant), then $f = O(1)$
if $f(n) \leq g(n)$ for all $n$, then $f = O(g)$
if $f = O(g)$, then for all constants $c$, $c f = O(g)$
if $f, h = O(g)$ then $f + h = O(g)$
if $f_1 = O(g_1)$ and $f_2 = O(g_2)$, then $f_1 \cdot f_2 = O(g_1 \cdot g_2)$

Conesequnce:

if $a \leq b$, then $n^a = O(n^b)$

Example

Show: $10 n^2 + 100 n + 1000 = O(n^2)$

Running Time Analysis

Assumptions

primitive operations take time $O(1)$
initializing objects of size $n$ (primitive data types) takes time $O(n)$

What is Running Time of `add(i, x)`?

    public void add(int i, E x) {	
	if (i < 0 || i > size) { throw new IndexOutOfBoundsException();}
	Node<E> nd = new Node<E>();
	nd.value = x;

	if (i == 0) {
	    nd.next = this.head;
	    this.head = nd;
	} else {
	    Node<E> pred = getNode(i - 1);
	    Node<E> succ = pred.next;
	    pred.next = nd;
	    nd.next = succ;
	}

	++size;
    }

Running Time of `getNode(i)`?

    private Node<E> getNode(int i) {
	// check if i is a valid index
	if (i < 0 || i >= size) return null;
	
	Node<E> cur = head;

	// find the i-th successor of the head
	for (int j = 0; j < i; ++j) {
	    cur = cur.next;
	}

	return cur;	
    }

What About `ArraySimpleList`?

    public void add(int i, E x) {

	if (i > size || i < 0) {throw new IndexOutOfBoundsException();}
	if (size == capacity) {
	    increaseCapacity();
	}
	++size;
	Object cur = x;
	for (int j = i; j < size; ++j) {
	    Object next = contents[j];
	    contents[j] = cur;
	    cur = next;
	}
    }

`increaseCapacity()?`

    private void increaseCapacity() {
	
    	// create a new array with larger capacity
    	Object[] bigContents = new Object[2 * capacity];

    	// copy contents to bigContents
    	for (int i = 0; i < capacity; ++i) {
    	    bigContents[i] = contents[i];
    	}

    	// set contents to refer to the new array
    	contents = bigContents;

    	// update this.capacity accordingly
    	capacity = 2 * capacity;
    }

Puzzle

What is the running time of

    static void buildList(SimpleList<Integer> list, int size) {
	for (int i = 0; i < size; i++) {
	    list.add(i, list.size());
	}
    }

when list is an ArraySimpleList?

Lecture 7: Running Times, Empirical and Formal

Overview

Last Time

ListSimpleUSet Code

Performance vs List Implementation

Why Such a Big Difference?

Comparing Get Methods

Today

But First

Remark on == vs .equals()

What about now?

Question

Terminology

Usage

For SimpleUSet<E>

Find and Remove

Back to Running Times

Question

Computer Architecture: An Oversimplification

Elementary Operations

What Determines Running Time?

Which Can Programmer Control?

Our Goal

Our Assumptions

Question

Asymptotic Analysis

Idea

Big O Notation

Big O, Formally

Example 1: List Add at Front

Example 2: List Add at Random

Properties of $O$

Example

Running Time Analysis

What is Running Time of add(i, x)?

Running Time of getNode(i)?

What About ArraySimpleList?

increaseCapacity()?

Puzzle

Does the analysis look right?

`ListSimpleUSet` Code

Comparing `Get` Methods

Remark on `==` vs `.equals()`

For `SimpleUSet<E>`

What is Running Time of `add(i, x)`?

Running Time of `getNode(i)`?

What About `ArraySimpleList`?

`increaseCapacity()?`