Will Rosenbaum | The Geometric Distribution

Consider the following random process: a coin is repeated flipped until the first “heads” appears. How many coin flips do we expect to perform until we see the first heads?

An anlogue of this process shows up frequently in the analysis of randomized algorithms. For example, we might consider a randomized sub-routine for a problem that succeeds in performing some task with some probability, \(p\) (\(0 < p < 1\)), and fails with probability \(1 - p\). We then repeat the sub-routine independently until it succeeds. In the coin flipping example, a failure is flipping tails, while heads is a success. If the coin is fair, we have \(p = 1/2\).

We can associate a random variable \(X\) with the processes described above: \(X\) is the number of trials until the first success. We can describe the probability distribution of \(X\) as follows. Since each trial succeeds with probability \(p\), the first trial will be a success with probability \(p\). Therefore \(P(X = 1) = p\). More generally, we can compute the probability that we witness our first success after \(k\) trials: the probability of a particular trial failing is \((1 - p)\). Since the trials are independent, the probability of getting \(k - 1\) consecutive failures is \((1 - p)^{k-1}\). The probability of succeeding on the \(k\)th trial after these failures is then \(p\), so the overall probability of getting \(k - 1\) failures followed by a success is \((1 - p)^{k-1} p\). Thus, we arrive at the following probability distribution:

\[P(X = k) = (1 - p)^{k-1} p \quad\text{ for }\quad k = 1, 2, 3, \ldots.\]

This probability distribution is known as the geometric distribution with parameter \(p\).

Note. There is some debate about which is the geometric distribution. Some sources take the geometric distribution to be the number of failures until the first success, rather than the total number of trials including the first success (as we do above). With this convention, the geometric distribution is \(Y = X - 1\), where \(X\) is the distribution defined above. Nonetheless we will stick to our convention of calling the distribution \(X\) above the geometric distribution (with parameter \(p\)).

With the probability distribution \(X\) defined above, we can compute its expected value, \(E(X)\)—i.e., the number of trials we’d expect to perform on average before seeing the first success. Recall that the expected value of a random variable taking on positive integral values is defined by the formula

\[E(X) = 1 \cdot P(X = 1) + 2 \cdot P(X = 2) + 3 \cdot P(X = 3) + \cdots.\]

This expression can be written more succinctly using summation notation:

\[E(X) = \sum_{k = 1}^\infty k P(X = k).\]

Proposition 1. Let \(X\) be the geometric random variable whose distribution is defined above. Then \(E(X) = 1 / p\).

For completeness, we give a proof of the proposition below. Note that for our case of flipping coins, we have \(p = 1/2\), so that \(E(X) = 2\). Thus, we expect to need to flip a coin twice before seeing our first heads. This fact may seem somewhat counterintuitive, as there is no bound on the number of coin flips before we see the first heads. Although it is unlikely, it is still possible that we flip many tails before seeing our first heads.

Generating Geometric Random Variables, \(p = 1/2\)

A natural method of generating a number from a geometric distribution with \(p = 1/2\) is to simulate the coin flipping process described at the beginning of the note: flip coins until the first toss results in heads, then return the total number of coins tossed.

To simulate coin flips, we can employ Java’s Random object. Spefically, the nextBoolean method will give us a (pseudo)random boolean value that we can interpret as the outcome of a coin flip by associating, say, true with heads and false with tails. For example the following method will return a geometrically distributed random variable with \(p = 1/2\):

int geometric() {
    Random r = new Random();
	
    int count = 0; // number of coin flips
	
    do {
        count++;
    } while (!r.nextBoolean());
	
    return count;
}

If program performance is important, the method above is not especially efficient because of the way that Java implements the nextBoolean() method. It will generally be more efficient to generate a random int (using the nextInt() method), then to read the individual bits of the int, interpreting, say, 1 as heads and 0 as tails. This method is employed in the pickHeight method for skiplists in ODS Section 4.2.

Proofs

Proof of Proposition 1. From the definition of expected value and the definition of \(X\), we have

\[E(X) = \sum_{k = 1}^\infty k P(X = k) = \sum_{k = 1}^\infty k (1 - p)^{k-1} p.\]

To simplify notation, we will denote \(q = 1 - p\), so that the expression above becomes

\[E(X) = \sum_{k = 1}^\infty k q^{k-1} p = p \sum_{k = 1}^\infty k q^{k-1}.\]

We can rewrite the final expression as

\[\begin{align*} E(X) &= p \sum_{k = 1}^\infty k q^{k-1}\\ &= p \sum_{k = 1}^\infty \sum_{j = k}^{\infty} q^{j-1}\\ &= p \sum_{k = 1}^\infty q^{k-1} \sum_{j = 0}^\infty q^j\\ &= p \sum_{k = 1}^\infty q^{k-1} \frac{1}{1 - q}\\ &= \frac{p}{1 - q} \sum_{k = 1}^\infty q^{k-1}\\ &= \frac{p}{1 - q} \sum_{k = 0}^\infty q^k\\ &= \frac{p}{1 - q} \frac{1}{1 - q}\\ &= \frac{p}{(1 - q)^2}\\ &= \frac 1 p. \end{align*}\]

The second equality above holds by rewriting

\[k q^{k-1} = \underset{k \text{ times}}{\underbrace{q^{k-1} + q^{k - 1} + \cdots + q^{k - 1}}}\]

then reorganizing the terms. More specifically, we can write the sum

\[\begin{align*} \sum_{k = 1}^\infty k q^{k-1} &= \underbrace{q^0} + \underbrace{q^1 + q^1} + \underbrace{q^2 + q^2 + q^2} + \cdots\\ &= (q^0 + q^1 + q^2 + \cdots) + (q^1 + q^2 + \cdots) + (q^2 + \cdots) + \cdots\\ &= \left(\sum_{j = 1}^\infty q^{j-1} \right) + \left(\sum_{j = 2}^\infty q^{j-1} \right) + \left(\sum_{j = 3}^\infty q^{j-1}\right)\cdots\\ &= \sum_{k = 1}^\infty \left(\sum_{j = k}^\infty q^{j-1}\right). \end{align*}\]

The third equality follows from factoring out the smallest power of \(q^j\) from each expression:

\[\begin{align*} \sum_{j = k}^\infty q^{j-1} &= q^{k-1} + q^k + q^{k+1} + \cdots\\ &= q^{k-1} (1 + q + q^2 + \cdots)\\ &= q^{k-1} \sum_{j = 0}^\infty q^j. \end{align*}\]

In lines 4 and 7, we have used the fact that \(\sum_{j = 0}^\infty q^j = \frac{1}{1 - q}\) for any \(q\) satisfying \(0 < q < 1\). The final equality holds because we defined \(q = 1 - p\). \(\Box\)