Will Rosenbaum | Simple Set ADTs

Mathematical Sets and Notation

A set represents a collection of distinct items, referred to as elements. We can explicitly define a set by listing its elements surrounded by curly braces. For example,

\[S = \{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\}\]

is the set of (names of) primary colors. Sets do not depend on the order in which their elements are written. For example,

\[\{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\} = \{\texttt{"yellow"}, \texttt{"blue"}, \texttt{"red"}\} = \cdots.\]

Further, sets do not contain duplicate elements; each element in a set is unique.

We denote the relation “is an element of”—or “contains”—using the symbol \(\in\). The negation—“not an element” or “does not contain” is denoted \(\notin\). Thus using \(S = \{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\}\) as before, we have \(\texttt{"blue"} \in S\) (read “blue is in \(S\)”) and \(\texttt{"green"} \notin S\) (read “green is not in \(S\)”).

Semantic Equivalence and Containment

In Java (as in many programming languages), there is a distinction between the strict equality of objects (specified by ==) and semantic equivalence (specified by the equals method—see this documentation). Given two variables referring to instances of some class, say T, var1 and var2, the statement var1 == var2 returns true if and only if var1 and var2 refer to the same object instance. On the other hand, var1.equals(var2) returns true when the two instances are “semantically equivalent” in that they represent the same value in a sense defined for the class T.

To make things more concrete, consider the following declarations:

Integer var1 = new Integer(2);
Integer var2 = new Integer(2);

The variables var1 and var2 refer to different Integer instances, as the code above creates two Integers using the keyword new. Thus, the statement var1 == var2 returns false—the two variables do not refer to the same object instance. On the other hand, var1 and var2 are semantically equivalent in the sense that the values represented by the two Integer instances are both \(2\). Therefore, we would (correctly) expect var1.equals(var2) to return true.

Going forward, whenever we discuss containment of sets, we will interpret \(x \in S\) to mean “\(S\) contains an element semantically equivalent to \(x\).” Continuing the Integer example above, if we added var1 to a set \(S\), then asked if var2 is contained in \(S\), we would expect the result to be “yes”: \(S\) does indeed contain an Integer whose value is \(2\).

Implementation note. When defining a new datatype (i.e., class) in Java, the new class inherits the equals() method from the Object class. The default behavior of the equals() method is equivalent to ==. In order to define an appropriate notion of semantic equivalence for a new class, you can override the default equals method as follows:

class MyClass {
    @Override
    boolean equals(Object o) {
        ...
    }
}

Note that argument to the equals method is an Object, and not a MyClass. You can use the instanceOf operator to first check that the parameter is a MyClass and not some other class:

// this is not equal to o if o is not an instance of MyClass
if (! (o instanceOf MyClass)) {
    return false;
}

Simple Set ADTs

In order to define objects representing sets, we describe two simple abstract data types (ADTs) specifying some desired functionality. It is possible to define many more operations on sets, but for now, we consider only very basic operations. The first ADT, SimpleUSet (simple unordered set) makes no assumptions about the types of elements being stored in a set. Basic functionality is provided to (1) test whether an element is contained in a set, (2) add an element to the set (if not already present), and (3) remove an element from the set (if present). Additionally, a SimpleUSet can report its size (i.e., number of elements it contains), and whether or not it is empty.

The second ADT SimpleSSet (simple sorted set) assumes that elements stored in the set can be ordered. That is, for any pair of distinct elements \(x, y \in S\), we have \(x < y\) or \(y < x\). We will see that this additional assumption on the nature of elements stored in a SimpleSSet can make some implementations of the ADT more efficient. Further, SimpleSSet provides additional functionality allowing one to access the smallest and largest elements (according to \(<\)).

Unsorted Sets

Here we formally describe the operations and effects of the SimpleUSet ADT. The state \(S\) of a SimpleUSet is the set of elements it contains: \(S = \{x_1, x_2, \ldots, x_n\}\). Below is a specification of the SimpleUSet operations:

\(\mathrm{size}()\):
- Return the number of elements (\(n\)) contained in the set.
\(\mathrm{isEmpty}()\):
- Return true if \(\mathrm{size}()\) is \(0\) and false otherwise.
\(\mathrm{find}(y)\):
- If \(y \in S\), then return \(x_i \in S\) satisfying \(x_i = y\) (where we use \(=\) to denote semantic equivalence as discussed above); otherwise return null.
\(\mathrm{add}(y)\):
- If \(y \in S\) (i.e., there is an element \(x_i\) in \(S\) that is semantically equivalent to \(y\)), then return false. Otherwise, update the the state of \(S\) to \(\{x_1, x_2, \ldots, x_n, y\}\) and return true.
\(\mathrm{remove}(y)\):
- If \(y \in S\)—say \(y = x_i\)—then return \(x_i\) and update the state to \(\{x_1, x_2, \ldots, x_{i-1}, x_{i+1}, \ldots, x_{n}\}\). If \(y \notin S\) then return null.

Exercise 1. While sets are distinct from lists, it is possible to implement the functionality of a set using a list. How could you represent a SimpleUSet as a SimpleList? How would you implement the operations \(\mathrm{find}\), \(\mathrm{add}\), and \(\mathrm{remove}\) using the list operations?

Sorted Sets

We now consider sets of elements that can be compared according to a “natural order” on the elements. That is, there is a notion of “less than” (formally a binary relation), denoted \(<\) such that given any two elements \(x, y\), then precisely one of the following holds:

\(x < y\),
\(y < x\),
\(x = y\) (semantic equivalence).

We also assume transitivity of \(<\): if \(x < y\) and \(y < z\), then \(x < z\). Given a set \(S\) of comparable elements, we can write \(S = \{x_1, x_2, \ldots, x_n\}\) where \(x_1 < x_2 < \cdots < x_n\).

The SimpleSSet ADT (simple sorted set) extends the SimpleUSet ADT, under the assumption that all elements in a SimpleSSet can be compared according to some ordering \(<\). The \(\mathrm{size}()\), \(\mathrm{isEmpty}()\), and \(\mathrm{remove}()\) methods are precisely the same as SimpleUSet. The \(\mathrm{add}(y)\) method has the same effect in SimpleSSet as SimpleUSet as well, except the element \(y\) is added “in sorted order” (if \(y \notin S\) before the operation). The \(\mathrm{find}(y)\) method is different from SimpleSSet than the specification for SimpleUSet. SimpleSSet also has two additional methods, \(\mathrm{findMin}()\) and \(\mathrm{findMax}()\), specified below. Again, the state of a SimpleSSet is an ordered set \(S = \{x_1, x_2, \ldots, x_n\}\) where we have \(x_1 < x_2 < \cdots < x_n\).

\(\mathrm{find}(y)\):
- If \(x_n < y\), then return null. Otherwise return the smallest \(x_i\) satisfying \(y \leq x_i\) (i.e, \(y = x_i\) or \(y < x_i\)).
\(\mathrm{findMin}()\):
- Return \(x_1\) or null if the set is empty.
\(\mathrm{findMax}()\):
- Return \(x_n\) or null if the set is empty.

Implementation note. In Java, we specify an ordering on elements of a class by implementing the Comparable interface. See the complete documentation here. The Comparable<T> interface requires that we implement a single method int compareTo(T o). The interpretation is as follows:

x.compareTo(y) returning a negative number is interpreted as \(x < y\).
x.compareTo(y) returning a positive number is interpreted as \(y < x\).
x.compareTo(y) returning 0 is interpreted as \(x = y\) (semantic equivalence).
- The compareTo method should always be implemented in such a way that x.compareTo(y) returns 0 if and only if x.equals(y).

The Comparable documentation shows that many built-in classes in Java already implement Comparable. Notably, all numerical classes (Integer, Double, etc) and String already implement the interface.

In writing a Java interface that specifies the SimpleSSet ADT, we would like to use generic types, such as

public interface SimpleSSet<E> { ... }

However, it must be the case that type E supports comparisons using the compareTo method. That is, E must implement the Comparable interface. In order to enforce this requirement, we declare:

public interface SimpleSSet<E extends Comparable<E>> { ... }

With this declaration, we can use our SimpleSSet to store a set of elements of any type E, so long as E implements Comparable<E>.