Lecture 18: Heaps and Unordered Sets

Overview

  1. Announcements
  2. Finishing Heaps
  3. Unordered Sets

Homework Remark

ArrayBinaryHeap vs HeapPriorityQueue

Next Week

  • No coding assignment
  • Take home “midterm”
    • similar structure to last time
    • more computational

Last Time: Heaps

Heaps…

  1. …are complete binary trees
  2. …elements satisfy heap property

Adding to Heaps

  1. Add new node at unique location where element can be added
    • maintains CBT property
  2. Bubble up procedure
    • restores heap property

Removing Min Element

  1. Copy value from unique removable leaf node to root and remove leaf
    • maintains CBT property
  2. Trickle down procedure
    • restores heap proprety

Representing Complete Binary Trees

Previously:

  • trees represented as linked nodes

Complete binary trees have much more predictable structure

  • can use an array to store complete binary trees efficiently!

Nodes vs Array Index

Question 1

For an index $i$, what is the index of $i$’s left child? Right child?

Question 2

For an index $i$, what is the index of $i$’s parent?

Question 3

Why didn’t we use arrays to represent (non complete) binary trees?

For Homework 07

Use an array to represent heap

  • implement insert
    1. add element to unique location (may need to resize)
    2. bubble up
  • implement removeMin
    1. remove unique “node” and copy value to root
    2. trickle down

Unordered Sets

Big Picture so Far

Data Structures:

  1. Arrays
    • usual arrays
    • circular arrays
  2. Linked Lists
    • singly linked lists
    • doubly linked lists
  3. Trees
    • binary (search) trees
    • AVL trees
    • complete binary trees
    • heaps

ADTs

  1. Stack, Queue, Deque
  2. List
  3. Sets
    • unordered sets
    • sorted sets
  4. Priority Queue

Implementation Efficiency I

  1. Stack, Queue, Deque:
    • (doubly) linked list: all ops in $O(1)$ time
    • (circular) array: all ops in $O(1)$ time (amortized)
  2. List:
    • all ops in $O(n)$ time (array & linked list)
      • get in $O(1)$ time for array

Implementation Efficiency II

  1. Set:
    • Unordered sets all ops in $O(n)$ time (array & linked list)
    • Sorted Sets
      • find in $O(\log n)$ time for array (binary search), others in $O(n)$
      • all ops in $O(\log n)$ time for AVL tree
  2. Priority queue
    • Binary heap: min in $O(1)$, insert & removeMin in $O(\log n)$

Sortedness

Note. For sets, efficient implementations so far have crucially relied on comparability

  • comparability allows us to prescribe where a given element should be
  • do not need to examine all elements to determine if a given item is (not) present

Question

How might we find/add/remove when elements are not Comparable?

One Idea

Associate a numerical value to every possible element

  • numbers are comparable, so just do comparison by number

Two Issues

  1. How do we compute the numerical value consistently?
  2. What do we do about collisions?

Hashing

Idea. Given an object instance obj, compute a numberical value from data stored in obj

  • value is called a hash value or hash code

Application.

  • use hash value of obj to determine where in data structure obj should be stored

Goals.

  1. Different objects should be unlikely to have same hash value
  2. Should be able to specify range of possible values
  3. Semantically equivalent objects should have same hash value

Application: Hash Sets I

Goal. Implement unordered set ADT

  • add, find, remove methods

Assume. have access to a hash function $h$

  • for any object $x$, $h(x)$ is the hash code of $x$
  • the range of values for $h$ can be specified

Application: Hash Sets II

Idea. Store elements in an array

  • choose range of hash values to be $0, 1, \ldots, n-1$
    • $n$ is array size
  • to add, find, remove $x$, look at index $h(x)$

Example: Hashing Colors

$n = 6$

  • red
  • orange
  • yellow
  • blue

Uh Oh!

What do we do about collisions???

Chaining

Idea. Each entry of the array refers to the head of a linked list

  • linked list at arr[i] stores all elements $x$ with hash values $h(x) = i$

Hash Set with Chaining

  • store an array arr of heads of linked lists—hash table
  • assume hash function h has range 0, 1,..., n-1 w/ n = arr.length

How To add(x)?

How To find(x)?

How To remove(x)?

Running Time of operations?

What can Go Wrong?

Bad Hash Functions

Extreme example: h(x) = 0 always!

Too Many Elements!

Array size is fixed, but keep adding elements

  • What is the running time?

How to Resize?

Resize Challange

If we resize to larger array—say size $2 n$

  • must update hash function $h$ to have range $0, 1, \ldots, 2n -1$

  • this could change hash values of elements already in hash table

Resize Method

Next Time

  • Randomness and the Art of Hashing
  • Empirical Investigation