Assignment 03
Implementing a binary search tree
Overview
In this assignment you will provide an implementation of the SimpleSSet
interface using a binary search tree (BST) as the underlying data structure.
Note. I am well aware that many BST implementations in Java exist in many textbooks (including ours) and online. You are nonetheless expected to write your own implementation that gives the functionality required by SimpleSSet.java
.
Background
A sorted set represents a collection of elements that can be (pair-wise) compared. That is, given any two elements \(x\) and \(y\), we can say that one is “larger than” the other or they are equal: \(x < y\), \(y < x\), or \(x = y\). Like the unsorted sets you implemented in the previous assignment, a sorted set supports basic operations for finding, adding, and removing elements. Additionally, a sorted set provides access to the largest and smallest elements in the set. You can find a more formal description of set ADTs and variants in Simple Set ADTs.
In this assignment, you will implement the the SimpleSSet
ADT/interface using a Binary Search Tree:
Note that you will need both SimpleUSet.java
and SimpleSSet.java
to implement SimpleSSet
, as the latter interface extends the former.
In Java, an ordering on elements of a class can be specified by implementing the Comparable
interface. See the Comparable
documentation here.. Note that many familiar classes already implement the Comparable
interface: Integer
, String
, Double
, etc., all have a natural notion of comparison. For two elements E x
and E y
of a class E
that implements Comparable<E>
, we can compare the elements with the following interpretations:
x.compareTo(y) < 0
indicates thatx
is “less than”y
;x.compareTo(y) > 0
indicates thatx
is “greater than”y
;x.compareTo(y) == 0
should be equivalent tox.equals(y)
.
As discussed in class, a binary search tree represents a sorted set as a collection of nodes, where each node stores the value of an element in the set. Additionally, each node has a parent, a left child, and a right child. A BST stores a root node, and all nodes in the tree are descendants of the root. Additionally, a BST has the following property: if a node v
stores a value y
then:
- if a node
u
stores a valuex
andu
is a left descendant ofv
, thenx < y
, and - if a node
w
stores a valuez
andw
is a right descendant ofv
, theny < z
.
Your Task
For this assignment, you will write your own implementation of a SimpleSSet
using a binary search tree to store the elements. Specifically, you should write a class BinarySearchTree
with the following declaration:
1
2
3
public class BinarySearchTree<E extends Comparable<E>> implements SimpleSSet<E> {
...
}
In addition to the methods required by the SimpleSSet
interface, you should also include a method
1
2
3
public int height() {
...
}
that returns the height of the BST—i.e., the length of the longest path from the root to a leaf node.
As in the previous assignment, you will then apply your SimpleSSet
implementations to write a program that reads all of the words in a given (presumably large) text file, and reports the number of distinct words in the file. To this end, your program should maintain a SimpleSSet
of the words it has read so far. Whenever a word is read from the file, it should be added to the SimpleSSet
. Since the add
method for SimpleSSet
only adds the word if it is not already contained in the set, after add
ing every word from the text, the set will only contain the distinct words from the text—i.e., the size of the vocabulary in the text.
Additionally, you will use your BinarySearchTree
to see how the height of a randomly generated binary search tree grows with the size of the set it stores. You will also compare the performance of your BinarySearchTree
to another SimpleSSet
implementation—ArraySimpleSSet
(provided)—for the task of storing the distinct words in an English text.
Code
SimpleSSet.java
SimpleUSet.java
ArraySimpleSSet.java
An array based implementation of theSimpleSSet
interface.BSTTester.java
RandomBSTBuilder.java
Questions
-
Using your
BinarySearchTree
implementation, run the providedRandomBSTBuilder
program. This program builds a binary search tree by inserting random values into the tree and records the tree’s height in a filetree-heights.csv
. Generate a plot of this output. How quickly does the height of a random BST seem to grow with its size? Is this growth what you expected? Why or why not? -
In Assignment 02, you wrote a program that counted the number of distinct words in a text file using a
SimpleUSet
implementation. Since words are stored asString
s—which implement theComparable
interface—the words can be stored in aSimpleSSet
as well. Modify your word-reading program from Assignment 02 to compare the performance of yourBinarySearchTree
implementation to the providedArraySimpleSSet
implementation ofSimpleSSet
. Note thatArraySimpleSSet
stores the words in an array, and uses binary search for all finding operations. Which implementation,BinarySearchTree
orArraySimpleSSet
is faster for reading the words from a few provided texts? Is the difference significant? Why might you expect one or the other implementation to be faster?
What to Submit
BinarySearchTree.java
containing the simpleSimpleSSet
implementation described above.responses.md
containing your responses to the questions above.
Evaluation
Your submission will receive a score out of 10 points, broken down as follows:
- 5 points for
BinarySearchTree.java
.- code compiles, runs, and passes all tests in
BSTTester.java
- code is well organized and documented with comments
- code compiles, runs, and passes all tests in
- 5 points for responses to the questions above