# Lecture 23: Huffman Codes and Graphs

## Announcement

• HW 10 will be last required assignment
• Deadline pushed back to May 13th (last day of classes)
• Optional HW 11 will be posted soon

## Overview

1. Huffman Implementation Details
2. Graphs

## Last Time

Goal. Re-encode a (text) file to minimize file size.

Prefix Codes:

• Each (used) char gets assigned a binary codeword
• No codeword is a prefix of any other codeword
a -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100


## Prefix Codes as Trees

a -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100


## Example

Use previous tree to decode 111000011110

## Huffman Coding

• each distinct character corresponds to a leaf in encoding tree
• each node gets a weight
• weight of a leaf = frequency count
• wight of internal node = sum of frequencies of descendants

Then: form a tree by “merging” nodes by adding a parent

• pick two lightest nodes w/ out parents: $u$, $w$
• create a parent $v$ for $u$ and $w$
• continue until all nodes are connected

## Huffman, Illustrated

Build Huffman tree for text ABAAABBAACCBAAADEA

## Huffman More Formally

Node stores:

• a char c (0 if internal Node)
• an int weight
• left (0) and right (1) child (both null if leaf)

## Huffman Procedure

1. Create a Node for each distinct character in text, weight is character frequency
2. Add Nodes to a collection c
3. While c.size() > 1:
• remove 2 nodes from c with smallest weights: u, w
• create new node v
• v’s children are u and w
• v.weight = u.weight + w.weight
• add v to c
4. Set tree root to unique Node in c

## Question

Given a Huffman tree, how do we compute the resulting encoded size?

## Remarkable Fact

Theorem. Among all possible prefix codes for a given text, Huffman codes give the smallest possible encoded text.

## Homework 10

Implement Huffman coding

• don’t need to implement new containers from scratch
• Measure the size & compression ratio of encoding for different texts

Suggestion: it may be helpful to have 2 representations of Huffman code:

1. Binary tree (for decoding)
2. Something else (for encoding)

## Thoughts on Data Compression

Huffman codes are optimal prefix codes

• each character gets assigned a codeword

How large would Huffman encoding of text AAAAAAA...A ($n$ As) be?

## Could We Do Better?

How else might we encode AAAAAAA...A ($n$ As)?

## A Very Compressed File

void printString() {
for (int i = 0; i < n; i++) {
System.out.print("A");
}
}


## More Generally

The Kolmogorov Complexity of a string s is the length of the shortest program that can reproduce s

• must use fixed programming language, etc.

Neat Idea. The document is a program

Remarkable Fact. There is no algorithm that can determine the Kolmogorov complexity of strings!!!

• Mathematical impossibility result

# Graphs

## Motivation

So Far. “Highly structured” data structures

• user will specify sequence of operations
• we choose how to store data to implement interface
• we pick a representation of the data that is convenient

We controlled ths state of the data structure

## Real World

We do not get to decide where and how the data is stored

• we must navigate the data a presented!

## Commonalities of Examples

What do these examples have in common?

## Mathematical Formalism: Graphs

A graph $G = (V, E)$ consists of

• a set $V$ of vertices (a.k.a. nodes) $V = {v_1, v_2,\ldots,v_n}$
• a set $E$ of edges, where each edge is a pair of vertices

Example. $V = {1, 2, 3, 4, 5}$, $E = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 1), (1, 3)}$

## Graph Terminology

1. If $E$ contains $(u, v)$ then $u$ and $v$ are neighbors or adjacent
2. Two variants:
• undirected graph if $u$ is $v$’s neighbor, then $v$ is $u$’s neighbor
• directed graph $(u, v)$ an edge doesn’t imply $(v, u)$ is an edge

## Previous Examples as Graphs

Computer Network

• $V =$

• $E =$

Social Network

• $V =$

• $E =$

Tic-Tac-Toe

• $V =$

• $E =$

How to build use graphs?

• boolean adjacent(u, v) return true if u and v are adjacent (i.e., (u, v) is an edge)
• neighbors(u) return a List of vertices adjacent to u
• addVertex(u) add a vertex to set of vertices, if not already present
• removeVertex(u) remove u and edges containing u from graph
• addEdge(u, v) add an edge from u to v if not already present
• removeEdge(u, v) remove edge from u to v if present

(Can have others too…)

## Example

addVertex(1)

addVertex(2)

addVertex(3)

addVertex(4)

addEdge(1, 2)

addEdge(2, 3)

addEdge(2, 4)

addEdge(3, 4)

neighbors(2)

## Graph Representation

Question. How could we represent a graph?

• What ADTs should/could we use?

Maintain a Map<E, List<E>>:

• Keys are vertices (use datatype E)
• Vertices are List<E>s of vertices

## Implementation with Java Built-ins

HashMap<K, V>:

• containsKey(K x)
• get(K x)
• put(K x, V y)
• remove(K x)

ArrayList:

• add(E x)
• contains(E x)
• remove(E x)
• get(int i)
• size()

## How to…

boolean adjacent(u, v) return true if u and v are adjacent (i.e., (u, v) is an edge)

## How to…

neighbors(u) return a List of vertices adjacent to u

## How to…

addVertex(u) add a vertex to set of vertices, if not already present

## How to…

removeVertex(u) remove u and edges containing u from graph

## How to…

addEdge(u, v) add an edge from u to v if not already present

## How to…

removeEdge(u, v) remove edge from u to v if present

## Question for Next Time

Suppose we are given an arbitrary vertex $v$ in a graph. How could we determine if another vertex $u$ is reachable from $v$?