# Lecture 18: Heaps and Unordered Sets

## Overview

1. Announcements
2. Finishing Heaps
3. Unordered Sets

## Homework Remark

ArrayBinaryHeap vs HeapPriorityQueue

## Next Week

• No coding assignment
• Take home “midterm”
• similar structure to last time
• more computational

## Last Time: Heaps

Heaps…

1. …are complete binary trees
2. …elements satisfy heap property

1. Add new node at unique location where element can be added
• maintains CBT property
2. Bubble up procedure
• restores heap property

## Removing Min Element

1. Copy value from unique removable leaf node to root and remove leaf
• maintains CBT property
2. Trickle down procedure
• restores heap proprety

## Representing Complete Binary Trees

Previously:

• trees represented as linked nodes

Complete binary trees have much more predictable structure

• can use an array to store complete binary trees efficiently!

## Question 1

For an index $i$, what is the index of $i$’s left child? Right child?

## Question 2

For an index $i$, what is the index of $i$’s parent?

## Question 3

Why didn’t we use arrays to represent (non complete) binary trees?

## For Homework 07

Use an array to represent heap

• implement insert
1. add element to unique location (may need to resize)
2. bubble up
• implement removeMin
1. remove unique “node” and copy value to root
2. trickle down

# Unordered Sets

## Big Picture so Far

Data Structures:

1. Arrays
• usual arrays
• circular arrays
3. Trees
• binary (search) trees
• AVL trees
• complete binary trees
• heaps

1. Stack, Queue, Deque
2. List
3. Sets
• unordered sets
• sorted sets
4. Priority Queue

## Implementation Efficiency I

1. Stack, Queue, Deque:
• (doubly) linked list: all ops in $O(1)$ time
• (circular) array: all ops in $O(1)$ time (amortized)
2. List:
• all ops in $O(n)$ time (array & linked list)
• get in $O(1)$ time for array

## Implementation Efficiency II

1. Set:
• Unordered sets all ops in $O(n)$ time (array & linked list)
• Sorted Sets
• find in $O(\log n)$ time for array (binary search), others in $O(n)$
• all ops in $O(\log n)$ time for AVL tree
2. Priority queue
• Binary heap: min in $O(1)$, insert & removeMin in $O(\log n)$

## Sortedness

Note. For sets, efficient implementations so far have crucially relied on comparability

• comparability allows us to prescribe where a given element should be
• do not need to examine all elements to determine if a given item is (not) present

## Question

How might we find/add/remove when elements are not Comparable?

## One Idea

Associate a numerical value to every possible element

• numbers are comparable, so just do comparison by number

## Two Issues

1. How do we compute the numerical value consistently?
2. What do we do about collisions?

## Hashing

Idea. Given an object instance obj, compute a numberical value from data stored in obj

• value is called a hash value or hash code

Application.

• use hash value of obj to determine where in data structure obj should be stored

Goals.

1. Different objects should be unlikely to have same hash value
2. Should be able to specify range of possible values
3. Semantically equivalent objects should have same hash value

## Application: Hash Sets I

• add, find, remove methods

Assume. have access to a hash function $h$

• for any object $x$, $h(x)$ is the hash code of $x$
• the range of values for $h$ can be specified

## Application: Hash Sets II

Idea. Store elements in an array

• choose range of hash values to be $0, 1, \ldots, n-1$
• $n$ is array size
• to add, find, remove $x$, look at index $h(x)$

## Example: Hashing Colors

$n = 6$

• red
• orange
• yellow
• blue

## Uh Oh!

What do we do about collisions???

## Chaining

Idea. Each entry of the array refers to the head of a linked list

• linked list at arr[i] stores all elements $x$ with hash values $h(x) = i$

## Hash Set with Chaining

• store an array arr of heads of linked lists—hash table
• assume hash function h has range 0, 1,..., n-1 w/ n = arr.length

## What can Go Wrong?

Extreme example: h(x) = 0 always!

## Too Many Elements!

Array size is fixed, but keep adding elements

• What is the running time?

## Resize Challange

If we resize to larger array—say size $2 n$

• must update hash function $h$ to have range $0, 1, \ldots, 2n -1$

• this could change hash values of elements already in hash table

## Next Time

• Randomness and the Art of Hashing
• Empirical Investigation