Hash tables with chaining!
Goal: implement unordered set ADT–find, add, remove
Assumption: given a hash function $h$
Have array arr of lists, element x is stored in list arr[h(x)]
Example
red -> 2orange -> 0yellow -> 4green -> 1blue -> 0indigo -> 2violet -> 0Performance of operations depends on occupancy (size) of lists
Choose a hash function $h$ whose output “looks random”
Hash function value determined by data stored in instance so not really random
String s stores an array of chars
char has associated ASCII valuechar as a value from 0 to 255
Let n = s.length, interpret s as char array
Java computes
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]
where result is computed as an int
String s = "bake"
As a character array:
s = ['b', 'a', 'k', 'e']
= [98, 97, 107, 101]
Computing hash code:
int h = 98 * 31^3 + 97 * 31^2 + 107 * 31 + 101
= 3016153
You can confirm this with s.hashCode() in Java!
s.hashCode() returns an int
int range!How could we get a value from a prescribed range?
0 to n-1
If we want a value from 0 to n-1, use
s.hashCode() % nIs this good?
Using
s.hashCode() = s[0]*31^(n-1) + s[1]*31^(n-1) + ... + s[n-2]*31 + s[n-1]i = s.hashCode() % nfind a value of n and many strings s that hash to same value of i
Need to be careful with constructing hash functions!
Balance requirements:
Object class has a method int hashCode()!
hashCode() requirements from Java API:
Object x and invoke x.hashCode() multiple times in the same execution, all calls must return the same value
x.equals(y), then we must have x.hashCode() == y.hashCode()
x.equals(y) == false implies x.hashCode() != y.hashCode)
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
When defining a new class, define a hashCode() method with properties specified in Java API
The hashCode() method returns an arbitrary int value
x.hashCode() == y.hashCode() is “unlikely” unless x.equals(y)
For hash tables, we need a hash value in a specified range 0, 1,...,n-1!
x.hashCode() == y.hashCode()
x.hashCode() != y.hashCode() then index of x is “unlikely” to equal index of y
Implementation: define method int getIndex(E x)
x.hashCode()
0, 1,..., arr.length - 1
int capacity; // size of array
int getIndex(E x) {
return x.hashCode() % capacity;
}
Goal. Given an arbitrary int h = x.hashCode(), compute an index i from h such that:
i is in range 0, 1,...,n-1
h != k, then associated indices i and j are unlikely to be same
In Java, ints are 32-bit values
00000000000000000000000000000000
00000000000000000000000000000001
00000000000000000000000000000010
00000000000000000000000000000011
00000000000000000000000000000100
00000000000000000000000000000101
Last (right-most) bit is least siginificant bit
First is “sign” bit
Operations in binary are like grade-school arithmetic, but with only 1s and 0s:
00000000000000000000000000000011
+ 00000000000000000000000000000110
=
00000000000000000000000000000011
* 00000000000000000000000000000110
=
If we start with a non-zero number h and multiply it by a random odd number r and multiply h and r, the higher order bits of h * r tend to be random-ish
00000000000000000000100010110000 (h)
* 10111000110110000111000111010101 (r)
= ???????????????????????????10000
r
n = 2^k for some value of k
k bitsi from hash code h, take i to be the k most significant bits of h * r
00000000000000000000100010110000 (h)
* 10111000110110000111000111010101 (r)
= ???????????????????????????10000
protected int getIndex(E x) {
// r is a fixed odd number that was randomly
// chosen when table was constructed
// k is the log of the capacity (i.e., capacity = 2^k)
//get the first logCapacity bits of z * x.hashCode()
return ((r * x.hashCode()) >>> (32 - k));
}
The operator >>> is the unsigned bit shift operator
protected int getIndex(E x) {
// r is a fixed odd number that was randomly
// chosen when table was constructed
// k is the log of the capacity (i.e., capacity = 2^k)
//get the first logCapacity bits of z * x.hashCode()
return ((r * x.hashCode()) >>> (32 - k));
}
Assume
r is a randomly chosen odd numberx and y are objects with different hash codesn = 2^k is the size of arr
Then the probability (over choice of r) that getIndex(x) == getIndex(y) is at most 1/n
Informally: collisions are as unlikely as possible!
To get from x.hashCode() to index from 0,1,...,n-1 with n = 2^k
ints to k-bit values
r
r) for every call to getIndex
A Miracle:
hashCode() values are given!Assignment 08.
SimpleUSet
Assignment 09. (Optional)