Will Rosenbaum | Introduction to Probability

A computer program specifies a sequence of operations to be performed in order to accomplish some task. Often, the operations are deterministic: given a fixed input, the behavior of the program should always be the same for that input. Yet there are many problems for which an efficient and/or elegant solution can be obtained using randomization.

In a randomized program, some operations may be chosen randomly rather than deterministically according to the program’s input. Thus, different executions of the same program with the same input may give different behavior. While the potential unpredictability of randomized programs may seem at odds with program correctness and efficiency, we will see that for many randomized procedures it is possible to argue that undesireable outcomes are incredibly unlikely. In some cases, a randomized solution is so much simpler than a deterministic one that the slight chance of an undesireable outcome is overshadowed by the efficiency and simplicity of the randomized procedure.

The goal of this note is to introduce some of the basic concepts, vocabulary, and notation of probability. This material will serve as the foundation to reason about randomized procedures we will encounter going forward. For simplicity, we focus on discrete (finite) probability.

Probability Spaces

Perhaps the most familiar examples of randomness we encounter in our everday lives occur in games of chance. In many games, randomness is achieved by rolling dice, flipping coins, or dealing cards from a shuffled deck. Randomness affords a game a level of unpredictability. Yet patterns emerge from these random processes: a coin flipped repeatedly will typically yield an approximately equal number of heads and tails; a poker-player is almost never dealt a royal flush in a game of five-card stud. Probability is the quantitative study of random processes. That is, probability seeks to quantify the likelihood of different outcomes of a random processes.

The basic object of study in probability is a probability space. A (finite) probability space consists of:

a sample space space \(\Omega = \{\omega_1, \omega_2, \ldots, \omega_n\}\) whose elements are called outcomes, and
a probability measure \(P\) that associates a real number \(P(\omega_i)\)—the probability of \(\omega_i\) to each outome \(\omega_i\) in \(\Omega\) satisfying:
- \(0 \leq P(\omega_i) \leq 1\) for all outcomes \(\omega_i\), and
- \(P(\omega_1) + P(\omega_2) + \cdots + P(\omega_n) = 1\).

Example 1. We model the randomness of tossing a (fair) coin. In this case, there are two possible outcomes of a coin toss: heads and tails. We take our sample space to be \(\Omega = \{\mathrm{H},\mathrm{T}\}\). Since the result of a coin flip is equally likely to heads or tails, we have \(P(H)=P(T)=1/2\).

Example 2. Rolling a standard (six-sided) die has six outcomes which are equally likely. Thus we take \(\Omega=\{1,2,3,4,5,6\}\), and \(P(i) = 1/6\) for \(i = 1,2,\ldots,6\).

Events

An event \(A\) is a set of outcomes. We can compute the probability that event \(A\) occurs—denoted \(P(A)\)—by summing the probabilities of the different outcomes in \(A\).

Example 3. Consider rolling a six-sided die, and let \(A\) be the event that the outcome is at least \(4\). That is, \(A = \{4, 5, 6\}\). We can compute \(P(A)\) to be the sum of the probabilities of these outcomes. Since there are 3 possible outcomes in \(A\) and each occurs with probability \(1/6\), we have \(P(A) = 1/6 + 1/6 + 1/6 = 1/2\).

Example 4. Consider the process of rolling two dice. The first die can take any value from 1 to 6, and similarly with the second die. Thus, we can represent the outcome of a roll as a pair \((i, j)\) where \(i\) is the outcome of the first die and \(j\) is the outcome of the second. Thus, we have \(\Omega = \{(1, 1), (1,2), \ldots, (6, 6)\}\). Note that there are 36 possible outcomes, and all are equally likely. Thus we have \(P((i, j)) = 1 / 36\) for all outcomes. Consider the event \(A\) that the values of the dice sum to \(7\). Notice that \(A\) contains \(6\) outcomes: \(A = \{(1, 6), (2, 5), \ldots, (6, 1)\}\). Thefore, we compute \(P(A) = 6 \cdot (1 / 36) = 1 / 6\). Now consider the event \(B\) that the values of the dice sum to \(8\). Thus, \(B = \{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)\}\). Since \(B\) contains \(5\) outcomes, each occurring with probability \(1 / 36\), we have \(P(B) = 5 / 36\).

Exercise 1. Consider the scenario described in Example 4. Consider the event \(C\) that the values of the dice sum to at least \(10\). What is \(P(C)\)?

Random Variables

Given a probability space \((\Omega, P)\), a random variable \(X\) is a function that associates a (real) numerical value to each outcome \(\omega\) in \(\Omega\).

Example 5. Going back to the coin toss example above, we can define a random variable \(X\) on the coin toss probability space defined by \(C(\mathrm{H}) = 1\) and \(X(\mathrm{T}) = -1\). This random variable may arise as a simple game: two players toss a coin and bet on the outcome. If the outcome is heads, player 1 wins one dollar, while if the outcome is tails, player 1 loses one dollar.

Example 6. For the die rolling example, we can define the random variable \(Y\) by \(Y(i) = i\). The value of \(Y\) is simply the value showing on die after it is rolled.

Expected Values

Given a random variable \(X\) on a probability space \((\Omega, P)\), the expected value or average of \(X\) is defined to be

\[E(X) = X(\omega_1) P(\omega_1) + X(\omega_2) P(\omega_2) + \cdots + X(\omega_n) P(\omega_n).\]

We can denote the expression above more succinctly using summation notation as

\[E(X) = \sum_{i = 1}^n X(\omega_i) P(\omega_i).\]

The expected value of a random variable quantifies the average value we expect to see in a random variable if we repeat an experiment (e.g. a coin flip or die roll) many times over.

Example 7. Going back to the coin flip example, we can compute

\[E(X) = C(H) P(H) + C(T) P(T) = (1) \cdot \frac{1}{2} + (-1) \cdot \frac{1}{2} = 0.\]

This tells us that if we repeatedly play the coin flip betting game described above, neither player has an advantage; The players expect to win about as much as they lose.

Example 8. For the die rolling example, we compute

\[E(R) = R(1) P(1) + \cdots + R(6) P(1) = 1 \cdot \frac{1}{6} + \cdots + 6 \cdot \frac{1}{6} = 3.5.\]

Thus an “average” die roll is \(3.5\) (even though \(3.5\) cannot be the outcome of a single die roll).

Probability Distributions

Often, when speaking about random variables we omit reference to the underlying probability space. In this case, we speak only of the probability that a random variable \(X\) takes on various values. We refer to the set of possible values attained by \(X\) together with their respective probabilities as the probability distribution of \(X\). More formally, we can define the probability distribution of \(X\) by

\[f(x) = P(X = x).\]

The function \(f\) is known as the probability density function (or PDF) of \(X\).

For the coin flipping example above, \(X\) can be defined by the probability distribution

\[P(X = 1) = P(X = -1) = 1/2,\]

which gives rise to the PDF \(f(1) = f(-1) = 1 / 2\).

Note that the PDF does not reference the underlying sample space \(\Omega\). The danger in this view is that if we don’t explicitly define \(X\) as a function on some probability space, it may make comparison of random variables difficult. To see an example of this, consider the random variable \(S\) defined on the die roll sample space, \(\Omega = \{1,\ldots, 6\}\) by

\[S(i) = \begin{cases} 1 &\text{ if } i \text{ is even}\\ -1 &\text{ if } i \text{ is odd}. \end{cases}\]

Notice that, like our variable \(X\) defined for coin flips, we have \(P(S = 1) = P(S = -1) = 1/2\), so in some sense \(X\) and \(S\) are “the same.” However, they are defined on different sample spaces: \(X\) is defined on the sample space of coin flips, while \(S\) is defined on the sample space of die rolls.

Consider a game where the play is determined by a coin flip and a die roll. For the examples above, the random variable \(X\) depends only on the outcome of the coin flip, while \(Y\) and \(S\) depend only on the outcome of the die roll. Since the outcome of the coin flip has no effect on the outcome of the die roll, the variables \(X\) and \(Y\) are independent of one another, as are \(X\) and \(S\). However, \(Y\) and \(S\) depend on the same outcome (the die roll) so their values may depend on each other. In fact, the value of \(S\) is completely determined by the value of \(Y\)! So knowing the value of \(Y\) allows us to determine the value of \(S\), and knowing the value of \(S\) tells us something about the value of \(Y\) (namely whether \(Y\) is even or odd).

Independence

Definition. Suppose \(X\) and \(Y\) are random variables (or more generally, functions) defined on the same probability space, \((\Omega, P)\). We say that \(X\) and \(Y\) are independent if for all possible values \(x\) of \(X\) and \(y\) of \(Y\) we have

\[P(X = x \text{ and } Y = y) = P(X = x) P(Y = y).\]

For our examples above with the coin flip and the die roll, \(X\) and \(Y\) cannot be said to be independent because they are defined on different probability spaces. The variables \(Y\) and \(S\) are both defined on the die roll sample space, so they can be compared. However, they are not independent. For example, we have \(P(Y = 1) = 1/6\) and \(P(S = 1) = 1/2\). Since \(S(i) = 1\) only when \(i\) is even, we have

\[P(Y = 1 \text{ and } S = 1) = 0 \neq \frac{1}{6} \cdot \frac{1}{2}.\]

Let \(W\) be the random variable on the die roll sample space defined by

\[W(i) = \begin{cases} 1 & \text{ if } i = 1, 4\\ 2 & \text{ if } i = 2, 5\\ 3 & \text{ if } i = 3, 6. \end{cases}\]

We claim that \(W\) and \(S\) are independent. This can be verified by brute force calculation. For example, note that we have \(S = 1\) and \(W = 1\) only when the outcome of the die roll is 4. Therefore,

\[P(S = 1 \text{ and } W = 1) = \frac{1}{6} = P(S = 1) P(W = 1).\]

Similar calculations show that similar equalities hold for all possible values of \(S\) and \(W\), hence these random variables are independent.

Linearity of Expectation

Given two random variables \(X\) and \(Y\) defined on the same probability space, we can define their sum \(X + Y\) and product \(X Y\).

Proposition. Suppose \(X\) and \(Y\) are random variables on the probability space \((\Omega, P)\). Then

\[E(X + Y) = E(X) + E(Y) \quad\text{and}\quad E(X Y) = E(X) E(Y).\]

Proof. For the first equality, by definition we compute

\[E(X + Y) = \sum_{x, y} P(X = x \text{ and } Y = y) (x + y).\]

Using the fact that \(X\) and \(Y\) are independent, we find

\[\begin{align*} E(X + Y) &= \sum_{x, y} P(X = x \text{ and } Y = y) (x + y)\\ &= \sum_{x, y} P(X = x) P(Y = y) (x + y)\\ &= \sum_{x} P(X = x) x \sum_y P(Y = y) + \sum_y P(Y = y) y \sum_x P(X = x)\\ &= \sum_x P(X = x) x + \sum_y P(Y = y) y\\ &= E(X) + E(Y). \end{align*}\]

The fourth equality holds because \(P\) satisfies \(\sum_x P(X = x) = 1\) and \(\sum_y P(Y = y) = 1\). Similarly, we compute

\[\begin{align*} E(X Y) &= \sum_{x, y} P(X = x \text{ and } Y = y) x y\\ &= \sum_{x, y} P(X = x) P(Y = y) x y\\ &= \left(\sum_{x} P(X = x) x\right) \left(\sum_{y} P(Y = y) y\right)\\ &= E(X) E(Y). \end{align*}\]

These equations give the desired results. \(\Box\)

The equation \(E(X + Y) = E(X) + E(Y)\) is satisfied even if \(X\) and \(Y\) are not independent. This fundamental fact about probability is known as the linearity of expectation. However, in order to have \(E(X Y) = E(X) E(Y)\), \(X\) and \(Y\) must be independent.

Exercise. Prove that \(E(X + Y) = E(X) + E(Y)\) without assuming that \(X\) and \(Y\) are independent.

Exercise. Give an example of random variables \(X\) and \(Y\) for which \(E(X Y) \neq E(X) E(Y)\). (Note that \(X\) and \(Y\) cannot be independent.