Lecture 18: Heaps and Unordered Sets
Overview
- Announcements
- Finishing Heaps
- Unordered Sets
Next Week
- No coding assignment
- Take home “midterm”
- similar structure to last time
- more computational
Last Time: Heaps
Heaps…
- …are complete binary trees
- …elements satisfy heap property
Adding to Heaps
- Add new node at unique location where element can be added
- Bubble up procedure
Removing Min Element
- Copy value from unique removable leaf node to root and remove leaf
- Trickle down procedure
Representing Complete Binary Trees
Previously:
- trees represented as linked nodes
Complete binary trees have much more predictable structure
- can use an array to store complete binary trees efficiently!
Question 1
For an index $i$, what is the index of $i$’s left child? Right child?
Question 2
For an index $i$, what is the index of $i$’s parent?
Question 3
Why didn’t we use arrays to represent (non complete) binary trees?
For Homework 07
Use an array to represent heap
- implement
insert
- add element to unique location (may need to resize)
- bubble up
- implement
removeMin
- remove unique “node” and copy value to root
- trickle down
Big Picture so Far
Data Structures:
- Arrays
- usual arrays
- circular arrays
- Linked Lists
- singly linked lists
- doubly linked lists
- Trees
- binary (search) trees
- AVL trees
- complete binary trees
- heaps
ADTs
- Stack, Queue, Deque
- List
- Sets
- unordered sets
- sorted sets
- Priority Queue
Implementation Efficiency I
- Stack, Queue, Deque:
- (doubly) linked list: all ops in $O(1)$ time
- (circular) array: all ops in $O(1)$ time (amortized)
- List:
- all ops in $O(n)$ time (array & linked list)
-
get
in $O(1)$ time for array
Implementation Efficiency II
- Set:
- Unordered sets all ops in $O(n)$ time (array & linked list)
- Sorted Sets
-
find
in $O(\log n)$ time for array (binary search), others in $O(n)$
- all ops in $O(\log n)$ time for AVL tree
- Priority queue
- Binary heap:
min
in $O(1)$, insert
& removeMin
in $O(\log n)$
Sortedness
Note. For sets, efficient implementations so far have crucially relied on comparability
- comparability allows us to prescribe where a given element should be
- do not need to examine all elements to determine if a given item is (not) present
Question
How might we find
/add
/remove
when elements are not Comparable
?
One Idea
Associate a numerical value to every possible element
- numbers are comparable, so just do comparison by number
Two Issues
- How do we compute the numerical value consistently?
- What do we do about collisions?
Hashing
Idea. Given an object instance obj
, compute a numberical value from data stored in obj
- value is called a hash value or hash code
Application.
- use hash value of
obj
to determine where in data structure obj
should be stored
Goals.
- Different objects should be unlikely to have same hash value
- Should be able to specify range of possible values
- Semantically equivalent objects should have same hash value
Application: Hash Sets I
Goal. Implement unordered set ADT
-
add
, find
, remove
methods
Assume. have access to a hash function $h$
- for any object $x$, $h(x)$ is the hash code of $x$
- the range of values for $h$ can be specified
Application: Hash Sets II
Idea. Store elements in an array
- choose range of hash values to be $0, 1, \ldots, n-1$
- to
add
, find
, remove
$x$, look at index $h(x)$
Example: Hashing Colors
$n = 6$
Uh Oh!
What do we do about collisions???
Chaining
Idea. Each entry of the array refers to the head of a linked list
- linked list at
arr[i]
stores all elements $x$ with hash values $h(x) = i$
Hash Set with Chaining
- store an array
arr
of heads of linked lists—hash table
- assume hash function
h
has range 0, 1,..., n-1
w/ n = arr.length
Running Time of operations?
Bad Hash Functions
Extreme example: h(x) = 0
always!
Too Many Elements!
Array size is fixed, but keep adding elements
- What is the running time?
Resize Challange
If we resize to larger array—say size $2 n$
-
must update hash function $h$ to have range $0, 1, \ldots, 2n -1$
-
this could change hash values of elements already in hash table
Next Time
- Randomness and the Art of Hashing
- Empirical Investigation