Lecture 11: Binary (Search) Trees

Overview

Review of Binary Search
Binary Trees
Binary Search Trees

Last Time

Searching a Sorted Array: find(18)

       0  1  2  3  4   5   6   7   8   9   10  11  12  13  14  15
	   
arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

Binary Search Method

// search arr for value x between indices i and j
binarySearch(int[] arr, int x, int i, int j) {
    if (j == i + 1) {return arr[i];}
	
    int k = (i + j) / 2;	
	
    if (arr[k] <= x) {
	
        return binarySearch(arr, x, k, j);
		
    } else {
	
        return binarySearch(arr, x, i, k);
		
    }
}

Running Time of Binary Search

Worst case running time is $O(\log n)$

Why?

Time to Find: USet vs SSet

Time to Find SSet

Time to Find SSet (Long)

Lingering Question

Binary search allows us to find elements in a sorted array quickly:

$O(\log n)$ time versus $O(n)$ time previously

Sorted arrays are still costly to modify:

add method is $O(n)$, worst case
remove method is $O(n)$, worst case

Question. Can we perform all operation efficiently?

Comparing Arrays and Linked Lists

Array:

gives $O(1)$ access by index
- allows “jumping” for binary search
cost of “random” access: modifications move elements
- add/remove are $O(n)$

Linked List:

searching is $O(n)$ worst case
once location is determined, modification is $O(1)$

Question. Can we get the best of both worlds?

Leading Question

What is the array access pattern of binary search?

which indices are accessed first?
which indices are accessed second?
…

       0  1  2  3  4   5   6   7   8   9   10  11  12  13  14  15
	   
arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

Observation

Binary search accesses indices hierarchically

index $n/2$ is always first
indices $n/4$, $3n/4$ are always second
indeices $n/8$, $3n/8$, $5n/8$, $7n/8$ are always third
…

Idea:

store elements hierarchically, rather than linearly
- arrays store elements consecutively
- maintaining sortedness means moving entries around for each add/remove

Want: more flexibility to add/remove elements

Hierarchical Picture

       0  1  2  3  4   5   6   7   8   9   10  11  12  13  14  15
	   
arr = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]

More Formally

Like a linked list, store elements in associated nodes

Unlike a linked list, references no longer form a path

Each node has:

a left child
a right child
a parent
a comparable value

Keep track of root node (top of hierarchy)

Sorted Set Example

$S = \{2, 3, 5, 7, 13, 15, 17, 19\}$

How to Find?

Given previous structure, how do we find(11)?

How to Add?

Given previous structure, how do we add(11)?

How to Remove?

Given previous structure, how do we remove(2)?

How to Remove?

Given previous structure, how do we remove(15)?

Formalizing Things

A binary tree consists of

a collection of nodes
a distinguished node called the root
each node has
- a parent (null only for root)
- a left child
- a right child

Constraints:

if u is a child of v, then v is u’s parent
every node has the root as an ancestor
- $\implies$ no cycles!
- every node is a descendant of the root

Tree Terminology

a node without children is a leaf
a node that is not a leaf is internal
depth of a node is its distance to the root
- depth of tree is max depth of any node

Height

The height of a node is its max distance to a descendent leaf

height of leaf = 0
height of internal node is 1 + maximum height of children
height of tree = height of root

So Far

Specified structure of binary trees
No assumptions about values stored in trees
Trees are incredibly useful and flexible data structures
- represent hierarchies
  - file structure in computer
  - dependency of method calls
  - representing arithemetic expressions

Next up: represented sorted collections

Binary Search Trees

Assume values stored in nodes are comparable with $< $

given (values of) any two nodes $u$ and $v$, have $u < v$, $v < u$, or $v = u$

A tree is a binary search tree (BST) if for every node $v$:

if $u$ is a left descendant of $v$, then $u < v$
if $w$ is a right descendant of $v$, then $w > v$

BST Example

Searching a BST

How to find(x) in a BST? What is find complexity?

Adding to a BST

How to add(x) in a BST? What is add complexity?

Removing From a BST I

How to remove(y)…

… if y is a leaf?

Removing From a BST II

How to remove(y)…

… if y has one child?

Removing From a BST III

How to remove(y)…

… if y has two children?

How to Print Elements in Order?

How to Find Next Largest?

Given node $v$ in BST $T$, what node stores the next largest value?