Lecture 11: Binary Search Trees & Balance

Overview

  1. Binary Trees
  2. Binary Search Trees
  3. (Height) Balanced Binary Trees

Last Time: Binary Tree

A binary tree consists of

  • a collection of nodes
  • a distinguished node called the root
  • each node has
    • a parent (null only for root)
    • a left child
    • a right child

Constraints:

  1. if u is a child of v, then v is u’s parent
  2. every node has the root as an ancestor
    • $\implies$ no cycles!
    • every node is a descendant of the root

Tree Terminology

  • a node without children is a leaf
  • a node that is not a leaf is internal
  • depth of a node is its distance to the root
    • depth of tree is max depth of any node

Height

The height of a node is its max distance to a descendent leaf

  • height of leaf = 0
  • height of internal node is 1 + maximum height of children
  • height of tree = height of root

Picture of Tree

So Far

  • Specified structure of binary trees

  • No assumptions about values stored in trees

  • Trees are incredibly useful and flexible data structures

    • represent hierarchies
      • file structure in computer
      • dependency of method calls
      • representing arithemetic expressions

Next up: represented sorted collections

Binary Search Trees

Assume values stored in nodes are comparable with $< $

  • given (values of) any two nodes $u$ and $v$, have $u < v$, $v < u$, or $v = u$

A tree is a binary search tree (BST) if for every node $v$:

  • if $u$ is a left descendant of $v$, then $u < v$
  • if $w$ is a right descendant of $v$, then $w > v$

BST Example

Searching a BST

How to find(x) in a BST? What is find running time?

Adding to a BST

How to add(x) in a BST? What is add running time?

Removing From a BST I

How to remove(y)

… if y is a leaf?

Removing From a BST II

How to remove(y)

… if y has one child?

Removing From a BST III

How to remove(y)

… if y has two children?

What is remove Running Time?

How to Print Elements in Order?

Running Times

If $T$ is a tree of height $h$, what is the running time of…

  • find?
  • add?
  • remove?

Sequence of Ops Determines Structure

Consider $S = \{1, 2, 3, 4, 5\}$. What tree do we get if we add in order $3, 2, 4, 5, 1$? What about $2, 5, 1, 3, 4$?

What add Sequence Has Max Height?

Assume elements are $1, 2, 3,\ldots,n$…

Have We Failed?

If:

  1. operation sequence determines height, and
  2. height can be as large as $n-1$

Then:

  • add, remove, find are $O(n)$ in the worst case

This is worse than a sorted array (find is $O(\log n)$)

What can we do about it?

Restructuring Trees

Idea. When we modify the tree (add or remove), restructure the tree to maintain balance

  • use fact that there are many valid BSTs

Challenges.

  1. What structure do we want?
    • how does structure guarantee efficient operations?
  2. How do we check structure/modify to maintain structure?
  3. Can we restructure tree efficiently?

Coming Up

A binary tree $T$ is height balanced or an AVL tree (Adelson-Valsky & Landis) if for every node $v$ with children $u$ and $v$, we have $\vert h(u) - h(v)\vert \leq 1$.

We’ll show:

  1. Any AVL tree with $n$ nodes has height $h = O(\log n)$
  2. After a single add/remove operation, AVL property can be restored in $O(\log n)$ time

As a result

  • AVL trees implement add, remove, and find for sorted sets all in time $O(\log n)$