Goal. Re-encode a (text) file to minimize file size.
Prefix Codes:
char
gets assigned a binary codeworda -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100
a -> 00
b -> 010
c -> 011
d -> 101
e -> 110
f -> 1110
g -> 1111
h -> 100
Use previous tree to decode 111000011110
Idea. Start with all characters together with their frequency counts
Then: form a tree by “merging” nodes by adding a parent
Build Huffman tree for text ABAAABBAACCBAAADEA
Node
stores:
char c
(0
if internal Node
)int weight
left
(0) and right
(1) child (both null
if leaf)Node
for each distinct character in text, weight
is character frequencyNodes
to a collection c
c.size() > 1
:
c
with smallest weights: u
, w
v
v
’s children are u
and w
v.weight = u.weight + w.weight
v
to c
Node
in c
Given a Huffman tree, how do we compute the resulting encoded size?
Theorem. Among all possible prefix codes for a given text, Huffman codes give the smallest possible encoded text.
Implement Huffman coding
Suggestion: it may be helpful to have 2 representations of Huffman code:
Huffman codes are optimal prefix codes
How large would Huffman encoding of text AAAAAAA...A
($n$ A
s) be?
How else might we encode AAAAAAA...A
($n$ A
s)?
void printString() {
for (int i = 0; i < n; i++) {
System.out.print("A");
}
}
The Kolmogorov Complexity of a string s
is the length of the shortest program that can reproduce s
Neat Idea. The document is a program
Remarkable Fact. There is no algorithm that can determine the Kolmogorov complexity of strings!!!
So Far. “Highly structured” data structures
We controlled ths state of the data structure
We do not get to decide where and how the data is stored
What do these examples have in common?
A graph $G = (V, E)$ consists of
Example. $V = {1, 2, 3, 4, 5}$, $E = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 1), (1, 3)}$
Computer Network
$V = $
$E = $
Social Network
$V = $
$E = $
Tic-Tac-Toe
$V = $
$E = $
How to build use graphs?
boolean adjacent(u, v)
return true
if u
and v
are adjacent (i.e., (u, v)
is an edge)neighbors(u)
return a List of vertices adjacent to u
addVertex(u)
add a vertex to set of vertices, if not already presentremoveVertex(u)
remove u
and edges containing u
from graphaddEdge(u, v)
add an edge from u
to v
if not already presentremoveEdge(u, v)
remove edge from u
to v
if present(Can have others too…)
addVertex(1)
addVertex(2)
addVertex(3)
addVertex(4)
addEdge(1, 2)
addEdge(2, 3)
addEdge(2, 4)
addEdge(3, 4)
neighbors(2)
Question. How could we represent a graph?
Maintain a Map<E, List<E>>
:
E
)List<E>
s of verticesHashMap<K, V>
:
containsKey(K x)
get(K x)
put(K x, V y)
remove(K x)
ArrayList
:
add(E x)
contains(E x)
remove(E x)
get(int i)
size()
… boolean adjacent(u, v)
return true
if u
and v
are adjacent (i.e., (u, v)
is an edge)
…neighbors(u)
return a List of vertices adjacent to u
… addVertex(u)
add a vertex to set of vertices, if not already present
… removeVertex(u)
remove u
and edges containing u
from graph
… addEdge(u, v)
add an edge from u
to v
if not already present
… removeEdge(u, v)
remove edge from u
to v
if present
Suppose we are given an arbitrary vertex $v$ in a graph. How could we determine if another vertex $u$ is reachable from $v$?