# Shorter Certificates for Set Disjointness

Suppose two players, Alice and Bob, each hold equal sized subsets of $$[n] = {1, 2, \ldots, n}$$. A third party, Carole, wishes to convince Alice and Bob that their subsets are disjoint, i.e., their sets have no elements in common. How efficiently can Carole prove to Alice and Bob that their sets are disjoint?

To formalize the situation, suppose $$k$$ is an integer with $$1 \leq k \leq n$$, and let $$A$$ and $$B$$ be, respectively, Alice and Bob's subsets of $$[n]$$, both of size $$k$$.

1. Carole produces a certificate or proof that $$A \cap B = \emptyset$$ and sends this certificate to Alice and Bob.
2. Alice and Bob individually verify that Carole’s certificate is valid with respect to their individual inputs. Alice and Bob then send each other (very short) messages saying whether or not they accept Carole’s proof. If they both accept the proof, they can be certain that their sets are disjoint
In order for the proof system described above to be valid, it must satisfy the following two properties:

Completeness If $$A \cap B = \emptyset$$ then Carole can send a certificate that Alice and Bob both accept.

Soundness If $$A \cap B \neq \emptyset$$ then any certificate that Carole sends must be rejected by at least one of Alice and Bob.

Before giving a “clever” solution to this communication problem, we describe a naive solution. Since Carole sees $$A$$ and $$B$$, her proof of their disjointness could simply be to send Alice and Bob the (disjoint) pair $$C = (A, B)$$. Then Alice verifies the validity of the certificate $$C$$ by checking that her input $$A$$ is equal to the first term in $$C$$, and similarly Bob checks that $$B$$ is equal to the second term. Clearly, if $$A$$ and $$B$$ are disjoint, Alice and Bob will both accept $$C$$, while if $$A$$ and $$B$$ intersect, Alice or Bob will reject every certificate that Carole could send.

Let us quickly analyze the efficiency of this protocol. The certificate that Carole sends consists of a pair of $$k$$-subsets of $$[n]$$. The naive encoding of simply listing the elements of $$A$$ and $$B$$ requires $$2 k \log n$$ bits — each $$i \in A \cup B$$ requires $$\log n$$ bits, and there are $$2 k$$ such indices in the list. In fact, even if Carole is much cleverer in her encoding of the sets $$A$$ and $$B$$, she cannot compress the proof significantly for information theoretic reasons. Indeed, there are
$${n \choose k}{n-k \choose k}$$
distinct certificates that Carole must be able to send, hence her message must be of length at least
$$\log {n \choose k} \geq \log ((n/k)^k) = k \log n – k \log k.$$
Is it possible for Carole, Alice, and Bob to devise a more efficient proof system for disjointness?

The key observation in our more efficient protocol for set disjointness is the following: if $$A$$ and $$B$$ are disjoint if and only if there exists some $$S \subseteq [n]$$ such that $$A \subseteq S$$ and $$B \subset \bar{S}$$, where we use $$\bar{S}$$ to denote the complement of $$S$$. If $$k$$ is relatively small, say $$k = O(\log n)$$, there are many — in fact $$2^{n – 2k}$$–such sets $$S$$. So our strategy is to try to find a relatively small family $$\mathcal{F}$$ of subsets such that for every disjoint pair $$(A, B)$$, $$\mathcal{F}$$ contains a witness $$S$$ to the disjointness of $$A$$ and $$B$$ in the sense that $$A \subseteq S$$ and $$B \subseteq \bar{S}$$. Finding an explicit family $$\mathcal{F}$$ seems daunting, so we apply the probabilistic method. Specifically, we choose sets $$S \in \mathcal{F}$$ uniformly at random, and show that with positive probability, a relatively small choice of $$\mathcal{F}$$ will have the desired property.

Theorem. There exists a family $$\mathcal{F}$$ of size
$$|\mathcal{F}| \leq 2^{O(k + \log \log n)}$$
such that for every disjoint pair $$(A, B)$$ of $$k$$-subsets of $$[n]$$, there exists $$S \in \mathcal{F}$$ such that $$A \subseteq S$$ and $$B \subseteq \bar{S}$$.

The theorem implies that Carole can produce a certificate for the disjointness of $$A$$ and $$B$$ of length $$O(k + \log \log n)$$. Indeed, it takes Carole only $$\log |\mathcal{F}| = O(k + \log \log n)$$ bits to specify the suitable set $$S \in \mathcal{F}$$. Once Alice and Bob are told $$S$$, Alice can easily verify that $$A \subseteq S$$ and Bob that $$B \subseteq \bar{S}$$. In the case where $$k$$ is a constant independent of $$n$$, this certificate is exponentially shorter than the certificate in the naive protocol described above.

Proof. We employ the probabilistic method to give a randomized “construction” of such a family $$\mathcal{F}$$. Choose $$N$$ sets
$$S_1, S_2, \ldots, S_N \subseteq[n]$$
independently, uniformly at random from $$\mathcal{P}([n])$$. For any fixed disjoint pair $$(A, B)$$, we compute
\begin{align} \mathrm{Pr}{S_1, \ldots, S_N} (\text{for all } i, A \subseteq!!!!!/\ S\text{ or } B \subseteq!!!!!/\ \bar{S}) &= (\mathrm{Pr}{S \subseteq [n]} (A \subseteq!!!!!/\ \text{ or } B \subseteq!!!!!/\ \bar{S}))^N\ &= \left(1 – \frac{1}{4^k}\right)^N\ &\leq e^{-N / 4^k}. \end{align}
The first equality holds by the independence of choice of $$S_1, \ldots, S_N$$. The second equality holds because for each $$i \in A$$, $$\mathrm{Pr}(i \in S) = \frac 1 2$$, and similarly for $$j \in B$$, $$\mathrm{Pr}(j \in \bar S) = \frac 1 2$$. The inequality follows from the fact that for all $$x > 1$$, \((1 – 1/x)^x simultaneously. Thus, we apply the union bound to estimate

\begin{aligned}
\mathrm{Pr}_{S_1, \ldots, S_N} \left((\exists \text{ disjoint }(A, B))(\forall i \in [n])[A \subseteq!!!!!/\ S_i \text{ or } B \subseteq!!!!!/\ \bar{S_i}]\right)
&