The Multiplication Rule
Probability Theory
When Do A and B Both Happen? Chain Your Probabilities
The multiplication rule calculates the probability of A and B both occurring. It chains conditional probabilities to find joint likelihoods.
- General rule — P(A and B) equals P(A) times P(B given A)
- Independent events — When A does not affect B, the formula simplifies to P(A) times P(B)
- Sequential events — Drawing cards without replacement is the classic application
- Tree diagrams — Visualize conditional paths through multi-stage experiments
The multiplication rule is your tool for computing "both happen" probabilities.
What is the Multiplication Rule?
Definition
The multiplication rule calculates the probability of A and B both occurring. It chains conditional probabilities to find joint likelihoods.
General Multiplication Rule
Here,
- =Probability of both A and B occurring
- =Probability of event A
- =Probability of B given that A occurred
import numpy as np
# Drawing cards without replacement
S = [f"{rank} of {suit}"
for suit in ['Hearts', 'Diamonds', 'Clubs', 'Spades']
for rank in ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']]
# P(First card is Ace AND Second card is King)
aces = [c for c in S if c.startswith('A')]
p_first_ace = len(aces) / len(S)
# After drawing an ace, remaining cards
remaining = [c for c in S if c not in aces]
kings_remaining = [c for c in remaining if c.startswith('K')]
p_second_king_given_first_ace = len(kings_remaining) / len(remaining)
p_both = p_first_ace * p_second_king_given_first_ace
print(f"P(First is Ace) = {p_first_ace:.4f}")
print(f"P(King | First was Ace) = {p_second_king_given_first_ace:.4f}")
print(f"P(Ace then King) = {p_both:.4f}")
Independent Events
When A and B are independent, knowing A occurred doesn't change the probability of B:
Multiplication Rule for Independent Events
Here,
- =Independence: A doesn't affect B
# Independent events: flipping a coin and rolling a die
# P(Heads AND 6) = P(Heads) × P(6)
p_heads = 0.5
p_six = 1/6
p_both_independent = p_heads * p_six
print(f"P(Heads) = {p_heads:.4f}")
print(f"P(6) = {p_six:.4f}")
print(f"P(Heads and 6) = {p_both_independent:.4f}")
# Verify with simulation
np.random.seed(42)
n_sim = 100000
coins = np.random.choice(['H', 'T'], n_sim)
dice = np.random.randint(1, 7, n_sim)
simulated = np.sum((coins == 'H') & (dice == 6)) / n_sim
print(f"Simulated: {simulated:.4f}")
Testing for Independence
# Test: are "Heads" and "Even" independent when flipping a coin and rolling a die?
# P(Heads) = 0.5
# P(Even) = 0.5
# P(Heads and Even) should = 0.25 if independent
np.random.seed(42)
n_sim = 100000
coins = np.random.choice(['H', 'T'], n_sim)
dice = np.random.randint(1, 7, n_sim)
p_heads = np.mean(coins == 'H')
p_even = np.mean(dice % 2 == 0)
p_heads_and_even = np.mean((coins == 'H') & (dice % 2 == 0))
print(f"P(Heads) = {p_heads:.4f}")
print(f"P(Even) = {p_even:.4f}")
print(f"P(Heads ∩ Even) = {p_heads_and_even:.4f}")
print(f"P(Heads) × P(Even) = {p_heads * p_even:.4f}")
print(f"Are they independent? {np.isclose(p_heads_and_even, p_heads * p_even)}")
Chain Rule for Multiple Events
Chain Rule
Here,
- =Joint probability of all events
Multiplication Rule in Machine Learning
| ML Application | Multiplication Rule Usage | Why |
|---|---|---|
| Naive Bayes | P(A∩B) = P(A)×P(B) (conditional independence) | Core assumption |
| Chain rule | P(A∩B∩C) = P(A)×P(B|A)×P(C|A∩B) | Sequential prediction |
| Language models | P(w1,w2,...,wn) = ∏P(wi|w1,...,wi-1) | LLMs use chain rule |
import numpy as np
# Chain rule in language modeling
# P("the cat sat") = P("the") × P("cat"|"the") × P("sat"|"the cat")
probabilities = {'P(the)': 0.05, 'P(cat|the)': 0.02, 'P(sat|the cat)': 0.01}
chain_prob = 1
for k, v in probabilities.items():
print(f"{k} = {v}")
chain_prob *= v
print(f"\nP('the cat sat') = {' × '.join(str(v) for v in probabilities.values())} = {chain_prob:.8f}")
print("GPT and other LLMs are essentially learned chain rule models!")
Key Takeaways
Summary: Multiplication Rule
- General rule: P(A ∩ B) = P(A) × P(B|A) — chain of conditional probabilities
- Independent events: P(A ∩ B) = P(A) × P(B) — no conditioning needed
- Order matters in conditional probability: P(A∩B) ≠ P(B∩A) in general, but the result is the same
- Chain rule extends to any number of events: multiply conditional probabilities sequentially
- Test for independence: P(A ∩ B) = P(A) × P(B) — if true, A and B are independent
- Common mistake: assuming independence without verification