Hash tables with chaining!
Goal: implement unordered set ADT–find
, add
, remove
Assumption: given a hash function $h$
Have array arr
of lists, element x
is stored in list arr[h(x)]
Example
red -> 2
orange -> 0
yellow -> 4
green -> 1
blue -> 0
indigo -> 2
violet -> 0
Performance of operations depends on occupancy (size) of lists
Choose a hash function $h$ whose output “looks random”
Hash function value determined by data stored in instance so not really random
String s
stores an array of char
s
char
has associated ASCII valuechar
as a value from 0
to 255
Let n = s.length
, interpret s
as char
array
Java computes
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]
where result is computed as an int
String s = "bake"
As a character array:
s = ['b', 'a', 'k', 'e']
= [98, 97, 107, 101]
Computing hash code:
int h = 98 * 31^3 + 97 * 31^2 + 107 * 31 + 101
= 3016153
You can confirm this with s.hashCode()
in Java!
s.hashCode()
returns an int
int
range!How could we get a value from a prescribed range?
0
to n-1
If we want a value from 0
to n-1
, use
s.hashCode() % n
Is this good?
Using
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]
i = s.hashCode() % n
find a value of n
and many strings s
that hash to same value of i
Need to be careful with constructing hash functions!
Balance requirements:
Object
class has a method int hashCode()
!
hashCode()
requirements from Java API:
Object x
and invoke x.hashCode()
multiple times in the same execution, all calls must return the same value
x.equals(y)
, then we must have x.hashCode() == y.hashCode()
x.equals(y) == false
implies x.hashCode() != y.hashCode)
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
When defining a new class, define a hashCode()
method with properties specified in Java API
The hashCode()
method returns an arbitrary int
value
x.hashCode() == y.hashCode()
is “unlikely” unless x.equals(y)
For hash tables, we need a hash value in a specified range 0, 1,...,n-1
!
x.hashCode() == y.hashCode()
x.hashCode() != y.hashCode()
then index of x
is “unlikely” to equal index of y
Implementation: define method int getIndex(E x)
x.hashCode()
0, 1,..., arr.length - 1
int capacity; // size of array
int getIndex(E x) {
return x.hashCode() % capacity;
}
Goal. Given an arbitrary int h = x.hashCode()
, compute an index i
from h
such that:
i
is in range 0, 1,...,n-1
h != k
, then associated indices i
and j
are unlikely to be same
In Java, int
s are 32-bit values
00000000000000000000000000000000
00000000000000000000000000000001
00000000000000000000000000000010
00000000000000000000000000000011
00000000000000000000000000000100
00000000000000000000000000000101
Last (right-most) bit is least siginificant bit
First is “sign” bit
Operations in binary are like grade-school arithmetic, but with only 1
s and 0
s:
00000000000000000000000000000011
+ 00000000000000000000000000000110
=
00000000000000000000000000000011
* 00000000000000000000000000000110
=
If we start with a non-zero number h
and multiply it by a random odd number r
and multiply h
and r
, the higher order bits of h * r
tend to be random-ish
00000000000000000000100010110000 (h)
* 10111000110110000111000111010101 (r)
= ???????????????????????????10000
r
n = 2^k
for some value of k
k
bitsi
from hash code h
, take i
to be the k
most significant bits of h * r
00000000000000000000100010110000 (h)
* 10111000110110000111000111010101 (r)
= ???????????????????????????10000
protected int getIndex(E x) {
// r is a fixed odd number that was randomly
// chosen when table was constructed
// k is the log of the capacity (i.e., capacity = 2^k)
//get the first logCapacity bits of z * x.hashCode()
return ((r * x.hashCode()) >>> (32 - k));
}
The operator >>>
is the unsigned bit shift operator
protected int getIndex(E x) {
// r is a fixed odd number that was randomly
// chosen when table was constructed
// k is the log of the capacity (i.e., capacity = 2^k)
//get the first logCapacity bits of z * x.hashCode()
return ((r * x.hashCode()) >>> (32 - k));
}
Assume
r
is a randomly chosen odd numberx
and y
are objects with different hash codesn = 2^k
is the size of arr
Then the probability (over choice of r
) that getIndex(x) == getIndex(y)
is at most 1/n
Informally: collisions are as unlikely as possible!
To get from x.hashCode()
to index from 0,1,...,n-1
with n = 2^k
int
s to k
-bit values
r
r
) for every call to getIndex
A Miracle:
hashCode()
values are given!Assignment 08.
SimpleUSet
Assignment 09. (Optional)