Will Rosenbaum | Asymptotic Analysis and Big O Notation

Motivation

In this note, we describe a method of measuring the efficiency of a procedure using asymptotic analysis. We formally define “big O” notation for qualitatively comparing the growth of functions (that will represent the running times of our programs). This view abstracts away details of hardware executing the program, and focuses on how the time complexity (e.g., running time) scales with the size of the input. The approach disregards details of executions—essentially ignoring constant factors that affect the running time—but allows us to reason about efficiency in a way that is (almost) independent of the actual hardware executing a procedure.

Big O Notation

Throughout this section, we will use $f, g$ and $h$ to refer to functions from the natural numbers $\mathbf{N} = \{0, 1, 2, \ldots, \}$ to the positive real numbers $\mathbf{R}^+$. The interpretation is that $f(n)$ might be the running time of some method with an input of size $n$.

Formal Definition

Definition. Let $f, g : \mathbf{N} \to \mathbf{R}^+$ bef functions from the natural numbers to the positive real numbers. Then we write $f = O(g)$ (pronounced ”$f$ is (big) oh of $g$“) if there exists a natural number $N$ and positive constant $C > 0$ such that for all $n \geq N$, we have

\[f(n) \leq C \cdot g(n).\]

Informally, this definition says that $f = O(g)$ means that for sufficiently large values of $n$ (i.e., $n \geq N$), $f(n)$ is no more than a constant factor ($C$) larger than $g(n)$.

Arguing directly from the definition, in order to show that functions $f$ and $g$ satisfy $f = O(g)$, we must find values $N$ and $C$ such that $f(n) \leq C \cdot g(n)$ whenever $n \geq N$. In the following section, we will derive properties of $O$ that allow us to rigorously justify our calculations without needing to refer directly to the definition above.

Example. Consider the functions $f(n) = 10 n^3 + 7$ and $g(n) = n^3$. Notice that $f(n) \geq g(n)$ for all (non-negative) values of $n$. Thus from the definition of $O$, we have $g = O(f)$. Indeed, taking $N = 0$ and $C = 1$, the definition is satisfied.

On the other hand, we claim athat $f = O(g)$ as well. To prove this directly from the definition, we must find suitable values $N$ and $C$ satisfying the definition above. Since $f(n) \geq 10 n^3 = 10 \cdot g(n)$, we must use some $C > 10$ (otherwise $f(n) \leq C \cdot g(n)$ will not be satisfied). Consider taking $C = 11$. We can write

\[\begin{align*} 11 g(n) &= 11 n^3\\ &= 10 n^3 + n^3\\ &\geq 10 n^3 + 2^3 (\text{ if } n \geq 2)\\ &= 10 n^3 + 8\\ &> 10 n^3 + 7\\ &= f(n). \end{align*}\]

Thus, so long as $n \geq 2$, we have $f(n) \leq 11 \cdot g(n)$. Therefore, $f = O(g)$, where the definition is satisfied for $N = 2$ and $C = 11$.

The computations above are rather ad-hoc. In general, there will be many possible values of $N$ and $C$ for which $f$ and $g$ can be show to satisfy the definition. Below, we will describe general properties from which $O$-relationships can be determined rigorously without devolving to ad-hoc algebra.

Abuse of Notation. We often use the notation $O(g)$ to refer to “some function $h$ that satisfies $h = O(g)$.” In the example above, we showed that $f(n) = 10 n^3 + 7$ satsifies $f = O(n^3)$. If we were considering the function $h(n) = 2 n^4 + 3 n^3 + 7 = 2 n^4 + f(n)$, we may just as well write $h(n) = 2 n^4 + O(n^3)$. This shorthand will prove convenient when we don’t want to write out explicit terms of functions whose precise values are unknown or will be subsumed by an application of $O$ later on.

Properties

Here, we prove some useful properties of big O notation.

Proposition 1. Suppose $f, g, f_1, f_2, g_1, g_2, h$ are all functions from $\mathbf{N}$ to $\mathbf{R}^+$, and that $a > 0$ is a constant in $\mathbf{R}^+$. The the following hold:

If $f(n) \leq a$ for all $n$, then $f = O(1)$.
If $f(n) \leq g(n)$ for all $n$, then $f = O(g)$.
If $f = O(g)$, then $a \cdot f = O(g)$.
If $f = O(g)$ and $g = O(h)$, then $f = O(h)$.
If $f = O(g)$, then $f + O(g) = O(g)$ and $g + O(f) = O(g)$,
- in particular $f + g = O(g)$
If $f_1 = O(g_1)$ and $f_2 = O(g_2)$, then $f_1 \cdot f_2 = O(g_1 \cdot g_2)$.

Proof. We prove each assertion above in turn.

Since $f(n) \leq a$ for all $n$, taking $g(n) = 1$, we have $f(n) \leq a \cdot g(n)$ for all $n$. Thus, the definition of $f = O(g)$ is satisfied with $N = 0$ and $C = a$, so that $f = O(1)$.
If $f(n) \leq g(n)$ for all $n$, then the definition of $f = O(g)$ is satisfied with $N = 0$ and $C = 1$.
Suppose $f = O(g)$, and suppose $N'$ and $C'$ are the values for which the definition of $O$ is satisfied. That is, for all $n \geq N'$, we have $f(n) \leq C' \cdot g(n)$. Then for $n \geq N$, we also have $a \cdot f(n) \leq a C' \cdot g(n)$. Therefore, the definition of $a \cdot f = O(g)$ is satisfies for $N = N'$ and $C = a \cdot C'$.
Suppose the definition $f = O(g)$ is satisfied with values $N_f$ and $C_f$. That is, for $n \geq N_f$, we have $f(n) \leq C_f \cdot g(n)$. Similarly, suppose $g = O(h)$ with $N_g$ and $C_g$: for $n \geq N_g$ we have $g(n) \leq C_g \cdot h(n)$. Then, for $n \geq \max(N_f, N_g)$, we have both $f(n) \leq C_f g(n)$ and $g(n) \leq C_g \cdot h(n)$. Combining the last two inequalities, we obtain $f(n) \leq C_f \cdot (C_g \cdot h(n)) = C_f C_g \cdot h(n)$. Therefore, the definition of $f = O(h)$ is satisfied for $N = \max(N_f, N_g)$ and $C = C_f \cdot C_g$.
Suppose $f = O(g)$ and $h$ is any function satisfying $h = O(g)$. Suppose the definition of $f = O(g)$ is satisfied for $N_f$ and $C_f$—i.e., $f(n) \leq C_f \cdot g(n)$ for all $n \geq N_f$. Similarly suppose the definition of $h = O(g)$ is satisfied for values $N_h$ and $C_h$: $h(n) \leq C_h \cdot g(n)$ for all $n \geq N_h$. Observe that taking $N = \max(N_f, N_h)$, we have that for all $n \geq N$, both $f(n) \leq C_f \cdot g(n)$ and $h(n) \leq C_h \cdot g(n)$. Thus, for $n \geq N$, we have $f(n) + h(n) \leq C_f \cdot g(n) + C_h \cdot g(n) = (C_f + C_h) g(n)$. Therefore, the definition of $f + h = O(g)$ is satisfied for $N = \max(N_f, N_h)$ and $C = C_f + C_h$.

Now suppose $f = O(g)$, and $h = O(f)$. Then by property 4 above, we have $h = O(g)$. Therefore, $g + h = g + O(g)$. Applying the first assertion of 5 (proven in the paragraph above), we get $g + O(g) = O(g)$, so that $g + O(f) = g + O(g) = O(g)$, as claimed.
Suppose $f_1 = O(g_1)$ is satisfied for $N_1$ and $C_1$, and that $f_2 = O(g_2)$ is satisfied for $N_2$ and $C_2$. Then for $N = \max(N_1, N_2)$, and all $n \geq N$, we have $f_1(n) \leq C_1 \cdot g_1(n)$, and $f_2(n) \leq C_2 \cdot g_2(n)$. Therefore, $(f_1 \cdot f_2)(n) = f_1(n) \cdot f_2(n) \leq (C_1 g_1(n)) (C_2 g_2(n)) = (C_1 C_2) \cdot (g_1 \cdot g_2)(n)$. Therefore, $f_1 \cdot f_2 = O(g_1 \cdot g_2)$ is satisfied for $N = \max(N_1, N_2)$ and $C = C_1 \cdot C_2$.

So all of the properties hold, as desired. $\Box$

Using the properties above, we can more simply (yet just as rigorously) argue about big O notation. For example, Property 2 above also gives the following useful consequence:

Corollary. Suppose $a, b$ are constants with $a \leq b$. Then $n^a = O(n^b)$.

Example. Suppose $f$ is a second degree polynomial. That is, $f(n) = a n^2 + b n + c$ for some constants $a, b$ and $c$, where $a > 0$. Then $f = O(n^2)$. To see this, we compute:

\[\begin{align*} f(n) &= a n^2 + b n + c\\ &= a n^2 + b n + O(1)\\ &= a n^2 + O(n)\\ &= O(n^2). \end{align*}\]

Each manipulation above is justified as follows

The first equality is the definition of $f$.
The second equality holds by property 1.
The third equality holds by property 3 (which implies that $b n = O(n)$), and property 5.
The fourth equality holds by property 3 (which implies that $a n^2 = O(n^2)$, and property 5.

More generally, we can argue that any degree $k$ polynomial—i.e., a function $f$ of the form $f(n) = a_k n^k + a_{k-1} n^{k-1} + \cdots + a_1 n + a_0$ satisfies $f = O(n^k)$. Proving this fact in a mathematically rigorous way requires applying mathematical induction, but you can freely use this fact going forward.

When is $f \neq O(g)$?

The notation $f = O(g)$ is in some sense a weak condition on $f$ and $g$. That is, $f$ could be much, much larger than $g$, yet we we still have $f = O(g)$. For example, take $f = 10^{100} n^2$ and $g = 10^{-100} n^2$. You should convince yourself that we we have $f = O(g)$. But $g$ is always much smaller than $f$: $g(n) / f(n) = 10^{-200}$ for all $n$, which is a very tiny number indeed! So you might (rightfully) be concerned that $f = O(g)$ is too weak of a condition to be useful. For example, is it ever the case that $f$ is not $O(g)$?

Proposition 2. Suppose $a$ and $b$ are real values satisfying $a < b$. Then $n^b \neq O(n^a)$. That is, $n^b$ is not $O(n^a)$.

Given a proposition such as Proposition 2—a claim that a particular definition is not satisfied—it is common to apply a technique called “proof by contradiction”. That is, in order to prove that $n^b \neq O(n^a)$, we assume that the opposite is true—namely $n^b = O(n^a)$—and derive a contradiction from this assumption.

Proof. Suppose for the sake of contradiction that $n^b = O(n^a)$. Then—from the definition of $O$—there exists a natural number $N$ and constant $C$ such that for all $n \geq N$, we have $n^b \leq C \cdot n^a$. Dividing both sides of this expression by $n^a$ we get the equivalent expression $n^b / n^a = n^{b - a} \leq C$. Since $b - a > 0$, we can take raise both sizes of this expression to the power $1 / (b - a)$ to get

\[n = (n^{b - a})^{1 / (b - a)} \leq C^{1 / (b - a)}.\]

Thus, for all $n > C^{1 / (b - a)}$, the inequality $n^b \leq C \cdot n^a$ fails to hold. In particular, the expression $n^b \leq C \cdot n^a$ fails to hold for some $n \geq N$. Therefore, $f \neq O(g)$, as desired. $\Box$

Analysis of Code

In this section, we apply big O notation in order to describe the running times of procedures. In order to deduce anything about how long it will take a procedure to complete on a given input, we must formalize our assumptions about the running times of certain primitive operations. The actual running times of these primitive operations may vary drastically between different machines executing the same code. Big O notation abstracts away from the particular running times of primitive operations while allowing us to reason formally about how the running time of the procedure scales with the size of its input.

Assumptions

In order to apply Big O notation to express the running time of code, we make the following assumptions on the running times of operations in Java. We assume that the following operations are performed in $O(1)$ time:

reading, writing, creating, and modifying variables that store primitive data types,
- the primitive data types in Java are byte, short, int, long, float, double, boolean, char
performing an arithmetic or logical operation on a primitive data type,
making method calls and executing branching operations (i.e., if-then-else),
assigning and modifying references,
reading/writing a value from/to a particular index of an array.

The following operations’ running times scale linearly with the size (i.e., number of primitive elements) of the data structure. That is, if the object stores/represents $n$ primitive data types, then the running time of the following operations is $O(n)$:

initializing arrays and Strings (of length/capacity $n$),
creating/initializing new object instances.

Note. In Java, a String is represented internally as an array of chars. However, Strings are immutable: once a String is created it’s value cannot be modified. Instead, all String modification operations create a new String object with the desired contents. For example, given a char[] chArray of length $n$ and a String str of length $n$, reassigning, say chArray[0] = 'a' takes time $O(1)$, while making a String str2 whose contents is the same as str, except that the first letter is changed to 'a' takes time $O(n)$.

Examples

We now apply our running time analysis and big O notation to describe the qualitative running time of some actual code.

Example. Consider the following method, which takes as its parameter an array of ints, and returns the sum of the values stored in the array.

int sumContents (int[] values) {
    int size = values.length;
    int sum = 0;
    for (int i = 0; i < size; ++i) {
        sum += values[i];
    }

    return sum;
}

We let $n$ denote the length of the array—i.e., the value returned by be values.length. The assignments in lines 2 and 3 both run in time $O(1)$ (note that values.length reads and returns the value of an instance variable length). Each iteration of the for loop in lines 4–6 requires time $O(1)$: there is $O(1)$ overhead per iteration for modifying the value of i and checking the condition i < size, and the expression sum += values[i] runs in time $O(1)$. Since there are $n$ iterations of the for loop, the overall running time is

\[O(1) + n \cdot O(1) = O(n).\]

Example (adding to ArraySimpleList). In our ArraySimpleList implementation of SimpleList, we had the following add(i, x) method:

    public void add(int i, E x) {
	// i is a valid index if it is between 0 and size
	if (i > size || i < 0) {
	    throw new IndexOutOfBoundsException();
	}

	// check if we need to increase the capacity before inserting
	// the element
	if (size == capacity) {
	    increaseCapacity();
	}

	++size;

	// insert x by setting contents[i] to x and moving each
	// element previously at index j >= i to index j + 1.
	Object cur = x;
	for (int j = i; j < size; ++j) {
	    Object next = contents[j];
	    contents[j] = cur;
	    cur = next;
	}
    }

For simplicity, let us assume that (1) the index i is valid, so that the exception is not thrown in line 4, and (2) the capacity of the contents array is strictly larger than size so that increaseCapacity() is not called in line 10. (We will consider the increaseCapacity() method in a forthcoming lecture on amortized analysis.) With these assumptions, the operations performed in lines 3–17 each take time $O(1)$. Note that the expression Object cur = x at line 17 does not create a new object, but simply stores a reference as the variable cur—thus, the operation only takes $O(1)$ time. Each iteration of the for loop in lines 18–22 runs in time $O(1)$, as the loop perofrms $O(1)$ primitive operations. If the list has size $n$, there are $n - i$ total iterations performed. Therefore, the total running time is

\[O(1) + (n - i) \cdot O(1) = O(n - i + 1).\]

Note that we could have bounded $O(n - i + 1) = O(n)$. While this is true, the expression above is more precise. Specifically, it shows that adding to the end of the list may be more efficient than adding to the front. For example, adding at index $i = n$ is completed in time $O(1)$, while adding at index $0$ is only $O(n)$.

Example (adding to LinkedSimpleList). Here is the code for our LinkedSimpleList implementation of add and the getNode helper method:

    public void add(int i, E x) {
        ....

	Node<E> nd = new Node<E>();
	nd.value = x;

	if (i == 0) {
	    nd.next = this.head;
	    this.head = nd;
	} else {
	    Node<E> pred = getNode(i - 1);
	    Node<E> succ = pred.next;
	    pred.next = nd;
	    nd.next = succ;
	    
	}

	++size;
    }

    ...

    private Node<E> getNode(int i) {
	// check if i is a valid index
	if (i < 0 || i >= size) return null;
	
	Node<E> cur = head;

	// find the i-th successor of the head
	for (int j = 0; j < i; ++j) {
	    cur = cur.next;
	}

	return cur;	
    }
	
    ...
	
    private class Node<E> {
	Node<E> next;
	E value;
    }

Once again, we assume that the size of list is $n$ and $i$ is a valid index (i.e., $0 \leq i \leq n$). Observe that every statement in the add method except the call to getNode(i-1) in line 12 are performed in time $O(1)$. In particular, the creation of a new Node at line 4 only requires $O(1)$ time since a Node stores two references. As for the analysis of getNode(i), the running time of lines 26 and 28 are both $O(1)$. Each iteration of the loop in lines 31–33 can be performed in time $O(1)$, and $i$ iterations are performed. Therefore the overall running time of add(i, x) is $O(1) + O(i) = O(i + 1)$.

The previous two examples demonstrate the use of asymptotic analysis (i.e., the application of big O notation) to understand the qualitative running time of methods. They show that the two implementations of List—ArraySimpleList and LinkedSimpleList—differ in the relative efficiency of add depending on where the element is to be added. In particular, adding to the last few indices of ArraySimpleList can be performed in time $O(1)$, while adding to the first few indices of a LinkedSimpleList can be performed in $O(1)$ time. The experiments we ran in lecture gave empirical justification of this analysis.

On the other hand, big O notation is somewhat limited, in that it abstracts away the precise constants that effect the running times of procedures, such as the amount of time required for a given computer to execute a primitive operation, and the number of such operations used in an execution. Thus, the analysis above cannot tell us, for example, if add(n / 2, x) will be faster for an ArraySimpleList or LinkedSimpleList; both operations run in time $O(n / 2) = O(n)$. Empirically, we saw that the array implementation was faster in this regime.