Consider a sequence \(S = (s_1, s_2, \ldots, s_n)\) of elements from the set \([m] = {1, 2, \ldots, m}\). A *subsequence* \(\sigma\) of \(S\) is a sequence

\[

(1) \qquad \sigma = (s_{i_1}, s_{i_2}, \ldots, s_{i_{n'}})

\]

where \(i_1 < i_2 < \cdots < i_{n’}\). The subsequence is (weakly) *increasing* if

\[

(2) \qquad

s_{i_1} \leq s_{i_2} \leq \cdots \leq s_{i_{n’}}.

\]

Similarly, the sequence is strictly increasing if we replace the inequalities above with strict inequalities. The (length of the) *longest increasing subsequence* (LIS) of the sequence \(S\) (denoted \(\ell(S)\)) is the length of the longest subsequence that is increasing. That is the LIS is largest number \(n’\) such that there exist indices \(i_1 < i_2 < \cdots < i_{n’}\) such that (2) holds.

The LIS gives a natural measure of the “sortedness” of a sequence. If \(\ell(S) = n = |S|\), then the sequence is sorted (in increasing order). In general \(n – \ell(S)\) is number of deletions/insertions needed to sort the sequence. To see that \(n – \ell(S)\) deletions/insertions suffice, consider the following sorting algorithm: fix an increasing subsequence \(\sigma\) of length \(\ell(S)\). For each \(s \in S \setminus \sigma\), delete \(s\) from the sequence and insert it back into the sequence so that it is consistent with \(\sigma\). That is, find \(j \leq \ell(S)\) such that \(s_{i_j} \leq s \leq s_{i_{j+1}}\) and insert \(s\) into the sequence between \(i_j\) and \(i_{j+1}\). The resulting sequence \(S’\) satisfies \(\ell(S’) = \ell(S) + 1\) as the subsequence \(\sigma\) with \(s\) inserted is an increasing subsequence of length \(\ell(S) + 1\). Repeating this process \(n – \ell(S)\) times results in a sorted sequence. On the other hand, \(n – \ell(S)\) deletions/insertions are necessary because removing and re-inserting a single element can only change the length of the longest increasing subsequence by at most \(1\).

Given a sequence \(S\), \(\ell(S)\) can be efficiently computed by **patience sorting**. The algorithm was devised as a way to sort a deck of cards, so imagine that the sequence \(S\) is represented by an ordered deck of cards; i.e., \(s_1\) is the top card on the deck, \(s_2\) is second, and so on. When we are dealt the first card, we start a pile with that card face up. This first pile is the left-most pile — as we add new piles they will all be added to the right of any existing piles. In all piles, only the top card is visible. When we are shown subsequent cards, we place them (face up) according to the following rules:

- A card cannot be placed on top of a card of smaller or equal value;
- When a card is dealt, it is placed on top of the left-most pile whose top card is larger than the card that was dealt;
- If no such pile exists (that is, the dealt card is larger than the top cards of all piles) then the card is placed in a new pile to the right of the existing piles.

It is easily verified that the piles have the following properties: First each pile is increasing from top to bottom: the smallest card is on top, followed by the second smallest, and so on. Second, the top cards of the piles form an increasing sequence from left to right. Notice that the top cards do not necessarily form an increasing *sub*sequence of the original sequence, the top cards of the piles need not have been dealt in order from left to right.

Once all of the cards have been dealt, we claim that the number of piles is the length of the longest increasing subsequence \(\ell(S)\). Let \(p\) be the number of piles at the end of the algorithm. To show that \(p = \ell(S)\), we will prove the following facts:

- If \(S\) contains an increasing subsequence of length \(k\) then there will be at least \(k\) piles at the end of the algorithm (hence \(p \geq \ell(S)\))
- \(S\) contains a subsequence of length \(p\) (hence \(p \leq \ell(S)\)).

Combining 1 and 2 gives \(p = \ell(S)\) as desired.

To prove 1, suppose \(s_{i_1} \leq s_{i_2} \leq \cdots \leq s_{i_k}\) is an increasing subsequence. When we are dealt \(s_{i_1}\) it gets put on top of some pile. Since only smaller smaller cards can subsequently be put on top of \(s_{i_1}\), when we are dealt \(s_{i_2} \geq s_{i_1}\), it must be placed on a pile to the right of the pile containing \(s_{i_1}\). Similarly, each \(s_{i_{j+1}}\) must be put in a pile to the right of \(s_{i_j}\) so that at the end of the algorithm, the elements \(s_{i_1}, s_{i_2}, \ldots, s_{i_k}\) reside in different piles. In particular, there must be at least \(k\) piles.

To prove 2, we argue by induction on \(p\). If \(p = 1\), then \(S\) is nonempty, so it trivially contains an increasing subsequence of length \(1\); the top card on the pile \(s_{i_1}\) is itself an increasing subsequence of length \(1\). For the inductive step let \(s_{i_p}\) be the top card of the right-most pile. We claim that \(S\) contains an increasing subsequence of length \(p\) ending with \(s_{i_p}\). To see this, recall that in order for \(s_{i_p}\) to have been placed in the \(p\)-th pile (i.e., the right-most pile), there must have been a smaller on top of the \((p-1)\)-st pile when \(s_{i_p}\) was dealt. Call this card \(s_{i_{p-1}}\). By induction, when \(s_{i_{p-1}}\) was dealt, there was an increasing subsequence of length \(p-1\) ending with \(s_{i_{p-1}}\). Since \(s_{i_p} \geq s_{i_{p-1}}\) and \(s_{i_p}\) was seen after \(s_{i_{p-1}}\), appending \(s_{i_p}\) to the increasing subsequence yields an increasing subsequence of length \(p\).

Patience sorting runs in nearly linear time and uses nearly linear space. It turns out that for exact computation of the (length of the) LIS, patience sorting is essentially time and space optimal. Further, patience sorting is a streaming algorithm: the sequence is processed as the cards are dealt, and we do not require oracle access to the sequence. If we we only wish to approximate \(\ell(S)\), we can do better than the linear patience sorting.

If we grant oracle access to the sequence \(S\), there is a randomized algorithm that runs in time \(\log^c n\) that gives an approximation of \(\ell(S)\) up to an additive error of \(\delta n\) for any fixed \(\delta > 0\). The error of \(\delta n\) likely could not be improved for such a fast algorithm, as it seems that no algorithm could distinguish \(S\) with \(\ell(S) = 1\) from \(\ell(S) = n^{1 – \epsilon}\) for \(\epsilon > 0\) using only \(\log^c n\) samples from \(S\).

In the context of streaming algorithms, a modification of patience sorting using \(O(\sqrt{n})\) space gives a \((1 + \epsilon)\) multiplicative approximation of \(\ell(S)\). The algorithm is a deterministic memory-limited version of patience sorting. Further, it was proven that \(O(\sqrt{n})\) space is optimal for deterministic approximation streaming algorithms. However, the proof of the lower bound for space complexity does not apply to *randomized* streaming algorithms. It seems plausible that a streaming algorithm using polylogarithmic space could give a good approximation of \(\ell(S)\), but to my knowledge no one has yet devised such an algorithm (or shown a lower bound to the contrary). So this is what I have been working on lately.