Simple Set ADTs
Introducing Set ADTs and Interfaces
Mathematical Sets and Notation
A set represents a collection of distinct items, referred to as elements. We can explicitly define a set by listing its elements surrounded by curly braces. For example,
\[S = \{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\}\]is the set of (names of) primary colors. Sets do not depend on the order in which their elements are written. For example,
\[\{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\} = \{\texttt{"yellow"}, \texttt{"blue"}, \texttt{"red"}\} = \cdots.\]Further, sets do not contain duplicate elements; each element in a set is unique.
We denote the relation “is an element of”—or “contains”—using the symbol \(\in\). The negation—“not an element” or “does not contain” is denoted \(\notin\). Thus using \(S = \{\texttt{"red"}, \texttt{"blue"}, \texttt{"yellow"}\}\) as before, we have \(\texttt{"blue"} \in S\) (read “blue is in \(S\)”) and \(\texttt{"green"} \notin S\) (read “green is not in \(S\)”).
Semantic Equivalence and Containment
In Java (as in many programming languages), there is a distinction between the strict equality of objects (specified by ==
) and semantic equivalence (specified by the equals
method—see this documentation). Given two variables referring to instances of some class, say T
, var1
and var2
, the statement var1 == var2
returns true
if and only if var1
and var2
refer to the same object instance. On the other hand, var1.equals(var2)
returns true
when the two instances are “semantically equivalent” in that they represent the same value in a sense defined for the class T
.
To make things more concrete, consider the following declarations:
1
2
Integer var1 = new Integer(2);
Integer var2 = new Integer(2);
The variables var1
and var2
refer to different Integer
instances, as the code above creates two Integer
s using the keyword new
. Thus, the statement var1 == var2
returns false
—the two variables do not refer to the same object instance. On the other hand, var1
and var2
are semantically equivalent in the sense that the values represented by the two Integer
instances are both \(2\). Therefore, we would (correctly) expect var1.equals(var2)
to return true
.
Going forward, whenever we discuss containment of sets, we will interpret \(x \in S\) to mean “\(S\) contains an element semantically equivalent to \(x\).” Continuing the Integer
example above, if we added var1
to a set \(S\), then asked if var2
is contained in \(S\), we would expect the result to be “yes”: \(S\) does indeed contain an Integer
whose value is \(2\).
Implementation note. When defining a new datatype (i.e., class
) in Java, the new class inherits the equals()
method from the Object
class. The default behavior of the equals()
method is equivalent to ==
. In order to define an appropriate notion of semantic equivalence for a new class, you can override the default equals
method as follows:
1
2
3
4
5
6
class MyClass {
@Override
boolean equals(Object o) {
...
}
}
Note that argument to the equals
method is an Object
, and not a MyClass
. You can use the instanceOf
operator to first check that the parameter is a MyClass
and not some other class:
1
2
3
4
// this is not equal to o if o is not an instance of MyClass
if (! (o instanceOf MyClass)) {
return false;
}
Simple Set ADTs
In order to define objects representing sets, we describe two simple abstract data types (ADTs) specifying some desired functionality. It is possible to define many more operations on sets, but for now, we consider only very basic operations. The first ADT, SimpleUSet
(simple unordered set) makes no assumptions about the types of elements being stored in a set. Basic functionality is provided to (1) test whether an element is contained in a set, (2) add an element to the set (if not already present), and (3) remove an element from the set (if present). Additionally, a SimpleUSet
can report its size (i.e., number of elements it contains), and whether or not it is empty.
The second ADT SimpleSSet
(simple sorted set) assumes that elements stored in the set can be ordered. That is, for any pair of distinct elements \(x, y \in S\), we have \(x < y\) or \(y < x\). We will see that this additional assumption on the nature of elements stored in a SimpleSSet
can make some implementations of the ADT more efficient. Further, SimpleSSet
provides additional functionality allowing one to access the smallest and largest elements (according to \(<\)).
Unsorted Sets
Here we formally describe the operations and effects of the SimpleUSet
ADT. The state \(S\) of a SimpleUSet
is the set of elements it contains: \(S = \{x_1, x_2, \ldots, x_n\}\). Below is a specification of the SimpleUSet
operations:
- \(\mathrm{size}()\):
- Return the number of elements (\(n\)) contained in the set.
- \(\mathrm{isEmpty}()\):
- Return
true
if \(\mathrm{size}()\) is \(0\) andfalse
otherwise.
- Return
- \(\mathrm{find}(y)\):
- If \(y \in S\), then return \(x_i \in S\) satisfying \(x_i = y\) (where we use \(=\) to denote semantic equivalence as discussed above); otherwise return
null
.
- If \(y \in S\), then return \(x_i \in S\) satisfying \(x_i = y\) (where we use \(=\) to denote semantic equivalence as discussed above); otherwise return
- \(\mathrm{add}(y)\):
- If \(y \in S\) (i.e., there is an element \(x_i\) in \(S\) that is semantically equivalent to \(y\)), then return
false
. Otherwise, update the the state of \(S\) to \(\{x_1, x_2, \ldots, x_n, y\}\) and returntrue
.
- If \(y \in S\) (i.e., there is an element \(x_i\) in \(S\) that is semantically equivalent to \(y\)), then return
- \(\mathrm{remove}(y)\):
- If \(y \in S\)—say \(y = x_i\)—then return \(x_i\) and update the state to \(\{x_1, x_2, \ldots, x_{i-1}, x_{i+1}, \ldots, x_{n}\}\). If \(y \notin S\) then return
null
.
- If \(y \in S\)—say \(y = x_i\)—then return \(x_i\) and update the state to \(\{x_1, x_2, \ldots, x_{i-1}, x_{i+1}, \ldots, x_{n}\}\). If \(y \notin S\) then return
Exercise 1. While sets are distinct from lists, it is possible to implement the functionality of a set using a list. How could you represent a SimpleUSet
as a SimpleList
? How would you implement the operations \(\mathrm{find}\), \(\mathrm{add}\), and \(\mathrm{remove}\) using the list operations?
Sorted Sets
We now consider sets of elements that can be compared according to a “natural order” on the elements. That is, there is a notion of “less than” (formally a binary relation), denoted \(<\) such that given any two elements \(x, y\), then precisely one of the following holds:
- \(x < y\),
- \(y < x\),
- \(x = y\) (semantic equivalence).
We also assume transitivity of \(<\): if \(x < y\) and \(y < z\), then \(x < z\). Given a set \(S\) of comparable elements, we can write \(S = \{x_1, x_2, \ldots, x_n\}\) where \(x_1 < x_2 < \cdots < x_n\).
The SimpleSSet
ADT (simple sorted set) extends the SimpleUSet
ADT, under the assumption that all elements in a SimpleSSet
can be compared according to some ordering \(<\). The \(\mathrm{size}()\), \(\mathrm{isEmpty}()\), and \(\mathrm{remove}()\) methods are precisely the same as SimpleUSet
. The \(\mathrm{add}(y)\) method has the same effect in SimpleSSet
as SimpleUSet
as well, except the element \(y\) is added “in sorted order” (if \(y \notin S\) before the operation). The \(\mathrm{find}(y)\) method is different from SimpleSSet
than the specification for SimpleUSet
. SimpleSSet
also has two additional methods, \(\mathrm{findMin}()\) and \(\mathrm{findMax}()\), specified below. Again, the state of a SimpleSSet
is an ordered set \(S = \{x_1, x_2, \ldots, x_n\}\) where we have \(x_1 < x_2 < \cdots < x_n\).
- \(\mathrm{find}(y)\):
- If \(x_n < y\), then return
null
. Otherwise return the smallest \(x_i\) satisfying \(y \leq x_i\) (i.e, \(y = x_i\) or \(y < x_i\)).
- If \(x_n < y\), then return
- \(\mathrm{findMin}()\):
- Return \(x_1\) or
null
if the set is empty.
- Return \(x_1\) or
- \(\mathrm{findMax}()\):
- Return \(x_n\) or
null
if the set is empty.
- Return \(x_n\) or
Implementation note. In Java, we specify an ordering on elements of a class by implementing the Comparable
interface. See the complete documentation here. The Comparable<T>
interface requires that we implement a single method int compareTo(T o)
. The interpretation is as follows:
x.compareTo(y)
returning a negative number is interpreted as \(x < y\).x.compareTo(y)
returning a positive number is interpreted as \(y < x\).x.compareTo(y)
returning0
is interpreted as \(x = y\) (semantic equivalence).- The
compareTo
method should always be implemented in such a way thatx.compareTo(y)
returns0
if and only ifx.equals(y)
.
- The
The Comparable
documentation shows that many built-in classes in Java already implement Comparable
. Notably, all numerical classes (Integer
, Double
, etc) and String
already implement the interface.
In writing a Java interface that specifies the SimpleSSet
ADT, we would like to use generic types, such as
1
public interface SimpleSSet<E> { ... }
However, it must be the case that type E
supports comparisons using the compareTo
method. That is, E
must implement the Comparable
interface. In order to enforce this requirement, we declare:
1
public interface SimpleSSet<E extends Comparable<E>> { ... }
With this declaration, we can use our SimpleSSet
to store a set of elements of any type E
, so long as E
implements Comparable<E>
.