Lecture 18: Heaps and Unordered Sets

Overview

Announcements
Finishing Heaps
Unordered Sets

Homework Remark

ArrayBinaryHeap vs HeapPriorityQueue

Next Week

No coding assignment
Take home “midterm”
- similar structure to last time
- more computational

Last Time: Heaps

Heaps…

…are complete binary trees
…elements satisfy heap property

Adding to Heaps

Add new node at unique location where element can be added
- maintains CBT property
Bubble up procedure
- restores heap property

Removing Min Element

Copy value from unique removable leaf node to root and remove leaf
- maintains CBT property
Trickle down procedure
- restores heap proprety

Representing Complete Binary Trees

Previously:

trees represented as linked nodes

Complete binary trees have much more predictable structure

can use an array to store complete binary trees efficiently!

Nodes vs Array Index

Question 1

For an index $i$, what is the index of $i$’s left child? Right child?

Question 2

For an index $i$, what is the index of $i$’s parent?

Question 3

Why didn’t we use arrays to represent (non complete) binary trees?

For Homework 07

Use an array to represent heap

implement insert
1. add element to unique location (may need to resize)
2. bubble up
implement removeMin
1. remove unique “node” and copy value to root
2. trickle down

Unordered Sets

Big Picture so Far

Data Structures:

Arrays
- usual arrays
- circular arrays
Linked Lists
- singly linked lists
- doubly linked lists
Trees
- binary (search) trees
- AVL trees
- complete binary trees
- heaps

ADTs

Stack, Queue, Deque
List
Sets
- unordered sets
- sorted sets
Priority Queue

Implementation Efficiency I

Stack, Queue, Deque:
- (doubly) linked list: all ops in $O(1)$ time
- (circular) array: all ops in $O(1)$ time (amortized)
List:
- all ops in $O(n)$ time (array & linked list)
  - get in $O(1)$ time for array

Implementation Efficiency II

Set:
- Unordered sets all ops in $O(n)$ time (array & linked list)
- Sorted Sets
  - find in $O(\log n)$ time for array (binary search), others in $O(n)$
  - all ops in $O(\log n)$ time for AVL tree
Priority queue
- Binary heap: min in $O(1)$, insert & removeMin in $O(\log n)$

Sortedness

Note. For sets, efficient implementations so far have crucially relied on comparability

comparability allows us to prescribe where a given element should be
do not need to examine all elements to determine if a given item is (not) present

Question

How might we find/add/remove when elements are not Comparable?

One Idea

Associate a numerical value to every possible element

numbers are comparable, so just do comparison by number

Two Issues

How do we compute the numerical value consistently?
What do we do about collisions?

Hashing

Idea. Given an object instance obj, compute a numberical value from data stored in obj

value is called a hash value or hash code

Application.

use hash value of obj to determine where in data structure obj should be stored

Goals.

Different objects should be unlikely to have same hash value
Should be able to specify range of possible values
Semantically equivalent objects should have same hash value

Application: Hash Sets I

Goal. Implement unordered set ADT

add, find, remove methods

Assume. have access to a hash function $h$

for any object $x$, $h(x)$ is the hash code of $x$
the range of values for $h$ can be specified

Application: Hash Sets II

Idea. Store elements in an array

choose range of hash values to be $0, 1, \ldots, n-1$
- $n$ is array size
to add, find, remove $x$, look at index $h(x)$

Example: Hashing Colors

$n = 6$

red
orange
yellow
blue

Uh Oh!

What do we do about collisions???

Chaining

Idea. Each entry of the array refers to the head of a linked list

linked list at arr[i] stores all elements $x$ with hash values $h(x) = i$

Hash Set with Chaining

store an array arr of heads of linked lists—hash table
assume hash function h has range 0, 1,..., n-1 w/ n = arr.length

How To `add(x)`?

How To `find(x)`?

How To `remove(x)`?

Running Time of operations?

What can Go Wrong?

Bad Hash Functions

Extreme example: h(x) = 0 always!

Too Many Elements!

Array size is fixed, but keep adding elements

What is the running time?

How to Resize?

Resize Challange

If we resize to larger array—say size $2 n$

must update hash function $h$ to have range $0, 1, \ldots, 2n -1$
this could change hash values of elements already in hash table

Resize Method

Next Time

Randomness and the Art of Hashing
Empirical Investigation