Lecture 23: Huffman Codes and Graphs

Announcement

HW 10 will be last required assignment
Deadline pushed back to May 13th (last day of classes)
Optional HW 11 will be posted soon

Overview

Huffman Implementation Details
Graphs

Last Time

Goal. Re-encode a (text) file to minimize file size.

Prefix Codes:

Each (used) char gets assigned a binary codeword
No codeword is a prefix of any other codeword

a -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100

Prefix Codes as Trees

a -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100

Example

Use previous tree to decode 111000011110

Huffman Coding

Idea. Start with all characters together with their frequency counts

each distinct character corresponds to a leaf in encoding tree
each node gets a weight
- weight of a leaf = frequency count
- wight of internal node = sum of frequencies of descendants

Then: form a tree by “merging” nodes by adding a parent

pick two lightest nodes w/ out parents: $u$, $w$
create a parent $v$ for $u$ and $w$
continue until all nodes are connected

Huffman, Illustrated

Build Huffman tree for text ABAAABBAACCBAAADEA

Huffman More Formally

Node stores:

a char c (0 if internal Node)
an int weight
left (0) and right (1) child (both null if leaf)

Huffman Procedure

Create a Node for each distinct character in text, weight is character frequency
Add Nodes to a collection c
While c.size() > 1:
- remove 2 nodes from c with smallest weights: u, w
- create new node v
  - v’s children are u and w
  - v.weight = u.weight + w.weight
- add v to c
Set tree root to unique Node in c

Question

Given a Huffman tree, how do we compute the resulting encoded size?

Remarkable Fact

Theorem. Among all possible prefix codes for a given text, Huffman codes give the smallest possible encoded text.

Homework 10

Implement Huffman coding

Think about ADTs and data structures you’ll need
- don’t need to implement new containers from scratch
Measure the size & compression ratio of encoding for different texts

Suggestion: it may be helpful to have 2 representations of Huffman code:

Binary tree (for decoding)
Something else (for encoding)

Thoughts on Data Compression

Huffman codes are optimal prefix codes

each character gets assigned a codeword

How large would Huffman encoding of text AAAAAAA...A ($n$ As) be?

Could We Do Better?

How else might we encode AAAAAAA...A ($n$ As)?

A Very Compressed File

void printString() {
    for (int i = 0; i < n; i++) {
        System.out.print("A");	
    }	
}

More Generally

The Kolmogorov Complexity of a string s is the length of the shortest program that can reproduce s

must use fixed programming language, etc.

Neat Idea. The document is a program

Remarkable Fact. There is no algorithm that can determine the Kolmogorov complexity of strings!!!

Mathematical impossibility result

Graphs

Motivation

So Far. “Highly structured” data structures

programmer specifies ADT/interface
- user will specify sequence of operations
we choose how to store data to implement interface
we pick a representation of the data that is convenient

We controlled ths state of the data structure

Real World

We do not get to decide where and how the data is stored

we must navigate the data a presented!

Example 1: Computer Networks

Example 3: State Space of a Game

Commonalities of Examples

What do these examples have in common?

Mathematical Formalism: Graphs

A graph $G = (V, E)$ consists of

a set $V$ of vertices (a.k.a. nodes) $V = {v_1, v_2,\ldots,v_n}$
a set $E$ of edges, where each edge is a pair of vertices

Example. $V = {1, 2, 3, 4, 5}$, $E = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 1), (1, 3)}$

Graph Terminology

If $E$ contains $(u, v)$ then $u$ and $v$ are neighbors or adjacent
Two variants:
- undirected graph if $u$ is $v$’s neighbor, then $v$ is $u$’s neighbor
- directed graph $(u, v)$ an edge doesn’t imply $(v, u)$ is an edge

Previous Examples as Graphs

Computer Network

$V = $
$E = $

Social Network

$V = $
$E = $

Tic-Tac-Toe

$V = $
$E = $

Graphs We’ve Seen Already?

Graph ADT and Operations

How to build use graphs?

boolean adjacent(u, v) return true if u and v are adjacent (i.e., (u, v) is an edge)
neighbors(u) return a List of vertices adjacent to u
addVertex(u) add a vertex to set of vertices, if not already present
removeVertex(u) remove u and edges containing u from graph
addEdge(u, v) add an edge from u to v if not already present
removeEdge(u, v) remove edge from u to v if present

(Can have others too…)

Example

addVertex(1)

addVertex(2)

addVertex(3)

addVertex(4)

addEdge(1, 2)

addEdge(2, 3)

addEdge(2, 4)

addEdge(3, 4)

neighbors(2)

Graph Representation

Question. How could we represent a graph?

What ADTs should/could we use?

Adjacency List Representation

Maintain a Map<E, List<E>>:

Keys are vertices (use datatype E)
Vertices are List<E>s of vertices

Implementation with Java Built-ins

HashMap<K, V>:

containsKey(K x)
get(K x)
put(K x, V y)
remove(K x)

ArrayList:

add(E x)
contains(E x)
remove(E x)
get(int i)
size()

How to…

… boolean adjacent(u, v) return true if u and v are adjacent (i.e., (u, v) is an edge)

How to…

…neighbors(u) return a List of vertices adjacent to u

How to…

… addVertex(u) add a vertex to set of vertices, if not already present

How to…

… removeVertex(u) remove u and edges containing u from graph

How to…

… addEdge(u, v) add an edge from u to v if not already present

How to…

… removeEdge(u, v) remove edge from u to v if present

Question for Next Time

Suppose we are given an arbitrary vertex $v$ in a graph. How could we determine if another vertex $u$ is reachable from $v$?

Lecture 23: Huffman Codes and Graphs

Announcement

Overview

Last Time

Prefix Codes as Trees

Example

Huffman Coding

Huffman, Illustrated

Huffman More Formally

Huffman Procedure

Question

Remarkable Fact

Homework 10

Thoughts on Data Compression

Could We Do Better?

A Very Compressed File

More Generally

Graphs

Motivation

Real World

Example 1: Computer Networks

Example 2: Social Networks

Example 3: State Space of a Game

Commonalities of Examples

Mathematical Formalism: Graphs

Graph Terminology

Previous Examples as Graphs

Graphs We’ve Seen Already?

Graph ADT and Operations

Example

Graph Representation

Adjacency List Representation

Implementation with Java Built-ins

How to…

How to…

How to…

How to…

How to…

How to…

Question for Next Time