Assignment 07

Hash sets and maps

Overview

In class, we considered a hash table data structure (cf. ODS Chapter 5). For this assignment, you will complete an implementation of a hash table in order to implement the SimpleUSet interface. Additionally, you will use your hash table implementation to implement another ADT, the map ADT, as specified by the SimpleMap interface.

Background

Hash tables

A hash table stores a collection of elements in an array, where the index of an element x is determined according to a hash function applied to x. More formally, given an object instance x and an array arr of size n, a hash function associates a number in the range [0, 1,..., n-1] to the object x. The hash function must satisfy the property that if x and y are semantically equivalent (i.e., x.equals(y) evalues to true), then x and y hash to the same value. Moreover, this value is the same throughout the execution of the program. Good hash functions will also have the property that they “appear random:” the hash values of objects should be distributed uniformly over the range of possible values, and if x and y are not semantically equivalent, they should be unlikely to hash to the same value.

Even though any two particular objects x and y will be unlikely to collide (i.e., hash to the same value), collisions will occur. There are many ways in which one could deal with these collisions. One common method, called chaining, is to store elements associated with a given index in a linked list. That is, each index entry in the array arr stores a reference to a Node that is the head of a linked list. In order to find, add, or remove an element x, the corresponding operation is performed on the linked list whose head is at index getIndex(x) in arr.

The Map ADT

The map ADT is an abstract data type representing a mathematical function—a relation that assigns a single value to each key in a set. For example, a phone book can be thought of a map that associates a phone number (value) to each person (key) in a town. Formally, a map can be represented as a set of key-value pairs, \(S = \{(k_1, v_1), (k_2, v_2),\ldots,(k_n, v_n)\}\). Here each \(k_i\) is a key, and \(v_i\) its associated value. We require that the keys \(k_i\) are pairwise distict: for each \(i \neq j\), we have \(k_i \neq k_j\).

A map supports the following operations:

  • \(\mathrm{size}()\) returns the number of key-value pairs stored in the map.

  • \(\mathrm{isEmpty}()\) returns true if and only if the size of the map is \(0\).

  • \(\mathrm{get}(k)\) returns the value associated with key \(k\), or null if the map doesn’t contain \(k\).

  • \(\mathrm{put}(k, v)\) adds the value key-value pair \((k, v)\) to the map. If the map already contained a pair \((k, v')\) (i.e., with the same key), then then \((k, v')\) is removed from the map and the value \(v'\) is returned. Otherwise (if \(k\) was not previously in the map), the method returns null. If \(k\) is already present in the map with associated value \(v'\), this method can be thought of as reassigning \(k\)’s associated value from \(v'\) to \(v\).

  • \(\mathrm{remove}(k)\) removes the (unique) key-value pair, \((k, v)\), with key \(k\) from the map (if present), and returns the associated value \(v\). If the key \(k\) was not contained in the map, null is returned.

  • \(\mathrm{contains}(k)\) returns true if an only if there is pair \((k, v)\) contained in the map—i.e., \(k\) has an associated value in the map.

The map ADT is specified as the interface SimpleMap, linked to below.

Your Task

For this assignment, you will write two programs.

  1. Complete HashUSet.java, an implementation of the SimpleUSet interface that represents a set as a hash table with chaining. The file HashUSet.java provides a partial implementation, including a constructor, a getIndex method that defines an appropriate hash function, and a method increaseCapacity that increases the capacity of the table. The program additionally defines a Node class to be used for the linked list implementation of chaining.

  2. Write a class HashSimpleMap.java that implements the SimpleMap interface.

Below, you are provided with tester programs for the SimpleUSet and SimpleMap interfaces so you can test your code.

Suggestions

  1. For HashUSet, note that each entry in the array table is initialized to be a sentinel node (head) for its associated linked list. Using sentinel nodes in this way should simplify the logic of your add, remove, and find methods somewhat.

  2. For the HashSimpleMap, you should use a HashUSet to store key-value pairs. To this end, you will probably want to define an auxiliary class, Pair<K, V> that represents a pair consisting of a key (of type K) and a value (of type V). Note that if the HashUSet contains Pairs, you will need to figure out how to search the set for a value and return the corresponding key, if any. In particular, this will require your Pair class to have the property that if two Pairs have the same (semantically equivalent) keys, then they should be hashed to the same index. Similarly, the HashUSet should treat two Pairs as equal if they have the same key. With an appropriately defined Pair class, the implementations of the required methods for HashSimpleMap can be very short!

Code

Here is a zip archive of all the code below.

What to Submit

  1. Your completed HashUSet.java.
  2. Your implementation HashSimpleMap.java

Evaluation

Your submission will receive a score out of 10 points, broken down as follows:

  • 5 points for HashUSet.java
    • Code compiles, runs, and passes tests in SimpleUSetTester.java.
    • Code is sensibly organized with helpful comments.
  • 5 points for HashSimpleMap.java
    • Code compiles, runs, and passes tests in SimpleMapTester.java.
    • Code is sensibly organized with helpful comments.