Assignment 06: Binary Search Tree

Implementing a binary search tree

Overview

In this assignment you will provide an implementation of the SimpleSSet interface using a binary search tree (BST) as the underlying data structure.

Note. I am well aware that many BST implementations in Java exist in many textbooks (including ours) and online. You are nonetheless expected to write your own implementation that gives the functionality required by SimpleSSet.java.

Background

A sorted set represents a collection of elements that can be (pair-wise) compared. That is, given any two elements \(x\) and \(y\), we can say that one is “larger than” the other or they are equal: \(x < y\), \(y < x\), or \(x = y\). Like the unsorted sets you implemented in the previous assignment, a sorted set supports basic operations for finding, adding, and removing elements. Additionally, a sorted set provides access to the largest and smallest elements in the set. You can find a more formal description of set ADTs and variants in Simple Set ADTs.

In this assignment, you will implement the the SimpleSSet ADT/interface using a Binary Search Tree:

Note that you will need both SimpleUSet.java and SimpleSSet.java to implement SimpleSSet, as the latter interface extends the former.

In Java, an ordering on elements of a class can be specified by implementing the Comparable interface. See the Comparable documentation here.. Note that many familiar classes already implement the Comparable interface: Integer, String, Double, etc., all have a natural notion of comparison. For two elements E x and E y of a class E that implements Comparable<E>, we can compare the elements with the following interpretations:

  • x.compareTo(y) < 0 indicates that x is “less than” y;
  • x.compareTo(y) > 0 indicates that x is “greater than” y;
  • x.compareTo(y) == 0 should be equivalent to x.equals(y).

As discussed in class, a binary search tree represents a sorted set as a collection of nodes, where each node stores the value of an element in the set. Additionally, each node has a parent, a left child, and a right child. A BST stores a root node, and all nodes in the tree are descendants of the root. Additionally, a BST has the following property: if a node v stores a value y then:

  1. if a node u stores a value x and u is a left descendant of v, then x < y, and
  2. if a node w stores a value z and w is a right descendant of v, then y < z.

Your Task

For this assignment, you will write your own implementation of a SimpleSSet using a binary search tree to store the elements. Specifically, you should write a class BinarySearchTree with the following declaration:

1
2
3
public class BinarySearchTree<E extends Comparable<E>> implements SimpleSSet<E> {
    ...
}

In addition to the methods required by the SimpleSSet interface, you should also include a method

1
2
3
public int height() { 
    ...
}

that returns the height of the BST—i.e., the length of the longest path from the root to a leaf node.

As in the previous assignment, you will then apply your SimpleSSet implementations to write a program that reads all of the words in a given (presumably large) text file, and reports the number of distinct words in the file. To this end, your program should maintain a SimpleSSet of the words it has read so far. Whenever a word is read from the file, it should be added to the SimpleSSet. Since the add method for SimpleSSet only adds the word if it is not already contained in the set, after adding every word from the text, the set will only contain the distinct words from the text—i.e., the size of the vocabulary in the text.

Additionally, you will use your BinarySearchTree to see how the height of a randomly generated binary search tree grows with the size of the set it stores. You will also compare the performance of your BinarySearchTree to another SimpleSSet implementation—ArraySimpleSSet (provided)—for the task of storing the distinct words in an English text.

Code

Questions

  1. Using your BinarySearchTree implementation, run the provided RandomBSTBuilder program. This program builds a binary search tree by inserting random values into the tree and records the tree’s height in a file tree-heights.csv. Generate a plot of this output. How quickly does the height of a random BST seem to grow with its size? Is this growth what you expected? Why or why not?

  2. In Assignment 04, you wrote a program that counted the number of distinct uncommon words in a text file using a SimpleUSet implementation. Since words are stored as Strings—which implement the Comparable interface—the words can be stored in a SimpleSSet as well. Modify your word-reading program from Assignment 04 to compare the performance of your BinarySearchTree implementation to the provided ArraySimpleSSet implementation of SimpleSSet. Note that ArraySimpleSSet stores the words in an array, and uses binary search for all finding operations. Which implementation, BinarySearchTree or ArraySimpleSSet is faster for reading the words from a few provided texts? Is the difference significant? Why might you expect one or the other implementation to be faster?

What to Submit

  • BinarySearchTree.java containing the simple SimpleSSet implementation described above.
  • responses.pdf containing your responses to the questions above.

Evaluation

Your submission will receive a score out of 10 points, broken down as follows:

  • 5 points for BinarySearchTree.java.
    • code compiles, runs, and passes all tests in BSTTester.java
    • code is well organized and documented with comments
  • 5 points for responses to the questions above