All efficient ($O(\log n)$ time) data structures have relied on data types being Comparable
Question. Can we achieve similar efficiency without comparability?
Associate a numerical value to every possible element
Idea. Given an object instance obj
, compute a numberical value from data stored in obj
Application.
obj
to determine where in data structure obj
should be storedGoals.
Goal. Implement unordered set ADT
add
, find
, remove
methodsAssume. have access to a hash function $h$
Idea. Store elements in an array
add
, find
, remove
$x$, look at index $h(x)$$n = 6$
green
orange
yellow
blue
What do we do about collisions???
Idea. Each entry of the array refers to the head of a linked list
arr[i]
stores all elements $x$ with hash values $h(x) = i$arr
of heads of linked lists—hash table
h
has range 0, 1,..., n-1
w/ n = arr.length
add(x)
?red -> 2
orange -> 0
yellow -> 4
green -> 1
blue -> 0
indigo -> 2
violet -> 0
find(x)
?red -> 2
orange -> 0
yellow -> 4
green -> 1
blue -> 0
indigo -> 2
violet -> 0
remove(x)
?red -> 2
orange -> 0
yellow -> 4
green -> 1
blue -> 0
indigo -> 2
violet -> 0
Assume: computing $h(x)$ is $O(1)$^{*}
* - this may not be justified!
Extreme example: h(x) = 0
always!
Array size is fixed, but keep adding elements
If we resize to larger array—say size $2 n$
must update hash function $h$ to have range $0, 1, \ldots, 2n -1$
this could change hash values of elements already in hash table
A hash function $h$ takes an object instance $x$ and returns a value $h(x)$ in a specified range
Hash tables with chaining can support add
/find
/remove
methods required by unordered set ADT (SimpleUSet
)
op(x)
depends on occupancy of arr[h(x)]
Recall the (unbalanced) binary search tree:
Question 1. What was typical height for, say, Shakespeare?
Question 2. When did we expect typical height to be small?
Use randomness (somehow)!
red
orange
yellow
green
blue
indigo
violet
Choose hash values randomly:
Will this work?
If we compute $h(x)$ repeatedly, must get the same value every time!
Choose a hash function $h$ whose output “looks random”
Hash function value determined by data stored in instance so not really random
String s
stores an array of char
s
char
has associated ASCII valuechar
as a value from 0
to 255
Let n = s.length
, interpret s
as char
array
Java computes
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]
where result is computed as an int
String s = "bake"
As a character array:
s = ['b', 'a', 'k', 'e']
= [98, 97, 107, 101]
Computing hash code:
int h = 98 * 31^3 + 97 * 31^2 + 107 * 31 + 101
= 3016153
You can confirm this with s.hashCode()
in Java!
s.hashCode()
returns an int
int
range!How could we get a value from a prescribed range?
0
to n-1
If we want a value from 0
to n-1
, use
s.hashCode() % n
Is this good?
Using
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]
i = s.hashCode() % n
find a value of n
and many strings s
that hash to same value of i
Need to be careful with constructing hash functions!
Balance requirements:
More on hashing!