Conditional Probability

Probability Theory

When New Information Changes Everything You Thought You Knew

Conditional probability answers: "Given that B happened, what is the probability of A?" It restricts the sample space and recalculates within that reduced world.

Sample space reduction — Condition on B and ignore everything outside it
Bayesian updating — Use new evidence to revise your beliefs systematically
Probability trees — Visualize conditional paths through multi-stage experiments
Independence test — A and B are independent if P(A given B) equals P(A)

Conditional probability is the engine of learning from data. It is how evidence changes beliefs.

What is Conditional Probability?

Definition

The conditional probability of event $A$ given that event $B$ has occurred is defined as:

P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0

This formula restricts the sample space to $B$ and recalculates probabilities within that reduced space.

The Sample Space Reduction

When we condition on $B$ , we are saying: "Ignore everything outside $B$." The event $A \cap B$ represents the part of $A$ that survives this restriction. Dividing by $P(B)$ normalizes so that $P(B \mid B) = 1$ — the conditional probability of $B$ given itself is certain.

Formal Properties

ThConditional Probability is a Valid Probability Measure

For fixed $B$ with $P(B) > 0$ , the function $P(\cdot \mid B)$ satisfies all Kolmogorov axioms:

$P(A \mid B) \geq 0$ for all $A$
$P(\Omega \mid B) = 1$
$P(A_1 \cup A_2 \cup \cdots \mid B) = P(A_1 \mid B) + P(A_2 \mid B) + \cdots$ for disjoint $A_i$

Therefore, all theorems of probability apply to conditional probabilities as well.

The Multiplication Rule

Multiplication Rule

P(A \cap B) = P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A)

Here,

$P(A \cap B)$ =Joint probability of A and B
$P(A \mid B) \cdot P(B)$ =One way to compute it
$P(B \mid A) \cdot P(A)$ =Equivalent computation

Law of Total Probability

ThLaw of Total Probability

If $B_1, B_2, \ldots, B_k$ partition the sample space (pairwise disjoint, $\bigcup_i B_i = \Omega$ ), then for any event $A$ :

P(A) = \sum_{i=1}^k P(A \cap B_i) = \sum_{i=1}^k P(A \mid B_i) \, P(B_i)

Conditional Independence

DfConditional Independence

Events $A$ and $B$ are conditionally independent given $C$ if:

P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C)

Equivalently: $P(A \mid B, C) = P(A \mid C)$ — knowing $B$ (in addition to $C$ ) provides no additional information about $A$ .

Conditional Independence ≠ Marginal Independence

$A$ and $B$ can be marginally independent but conditionally dependent (and vice versa). This is known as Simpson's paradox and has profound implications for causal inference. For example, two medical treatments may appear equally effective overall, but when stratified by patient age, one treatment is clearly superior.

The Base Rate Fallacy

Medical Testing: Why Intuition Fails

Problem: A disease affects 1% of the population. A test is 99% sensitive (true positive rate) and 95% specific (true negative rate). If a patient tests positive, what is the probability they have the disease?

Intuitive (wrong) answer: "The test is 99% accurate, so the probability is about 99%."

Correct calculation using Bayes' theorem:

P(D \mid +) = \frac{P(+ \mid D) \, P(D)}{P(+)} = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx 0.166

Despite the test being 99% accurate, only about 17% of positive results are true positives. The low base rate (1%) dominates.

Worked Example: Two Dice

What is $P(\text{sum} = 7 \mid \text{first die} = 3)$ ?

The sample space is restricted to outcomes where the first die is 3: $\{(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)\}$ — six equally likely outcomes.

Only $(3,4)$ gives a sum of 7. Therefore:

P(\text{sum} = 7 \mid \text{first} = 3) = \frac{1}{6}

This equals $P(\text{sum} = 7) = 6/36 = 1/6$ , showing that the first die and the sum are independent (when the sum is 7).