Conditional Probability
Probability Theory
When New Information Changes Everything You Thought You Knew
Conditional probability answers: "Given that B happened, what is the probability of A?" It restricts the sample space and recalculates within that reduced world.
- Sample space reduction — Condition on B and ignore everything outside it
- Bayesian updating — Use new evidence to revise your beliefs systematically
- Probability trees — Visualize conditional paths through multi-stage experiments
- Independence test — A and B are independent if P(A given B) equals P(A)
Conditional probability is the engine of learning from data. It is how evidence changes beliefs.
What is Conditional Probability?
Definition
The conditional probability of event given that event has occurred is defined as:
This formula restricts the sample space to and recalculates probabilities within that reduced space.
The Sample Space Reduction
When we condition on , we are saying: "Ignore everything outside $B$." The event represents the part of that survives this restriction. Dividing by normalizes so that — the conditional probability of given itself is certain.
Formal Properties
ThConditional Probability is a Valid Probability Measure
For fixed with , the function satisfies all Kolmogorov axioms:
- for all
- for disjoint
Therefore, all theorems of probability apply to conditional probabilities as well.
The Multiplication Rule
Multiplication Rule
Here,
- =Joint probability of A and B
- =One way to compute it
- =Equivalent computation
Law of Total Probability
ThLaw of Total Probability
If partition the sample space (pairwise disjoint, ), then for any event :
Conditional Independence
DfConditional Independence
Events and are conditionally independent given if:
Equivalently: — knowing (in addition to ) provides no additional information about .
Conditional Independence ≠ Marginal Independence
and can be marginally independent but conditionally dependent (and vice versa). This is known as Simpson's paradox and has profound implications for causal inference. For example, two medical treatments may appear equally effective overall, but when stratified by patient age, one treatment is clearly superior.
The Base Rate Fallacy
Medical Testing: Why Intuition Fails
Problem: A disease affects 1% of the population. A test is 99% sensitive (true positive rate) and 95% specific (true negative rate). If a patient tests positive, what is the probability they have the disease?
Intuitive (wrong) answer: "The test is 99% accurate, so the probability is about 99%."
Correct calculation using Bayes' theorem:
Despite the test being 99% accurate, only about 17% of positive results are true positives. The low base rate (1%) dominates.
Worked Example: Two Dice
What is ?
The sample space is restricted to outcomes where the first die is 3: — six equally likely outcomes.
Only gives a sum of 7. Therefore:
This equals , showing that the first die and the sum are independent (when the sum is 7).
Conditional Probability in Machine Learning
| ML Application | Conditional Prob Usage | Why |
|---|---|---|
| Classification | P(class | features) | Core of supervised learning |
| Naive Bayes | P(feature_i | class) | Text classification |
| Bayesian networks | P(X | parents(X)) | Causal inference |
| Weather prediction | P(rain | humidity, pressure) | Probabilistic forecasting |
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Conditional probability IS what classifiers learn
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3)
model = GaussianNB()
model.fit(X_train, y_train)
# model.predict_proba gives P(class | features)
sample = X_test[:1]
proba = model.predict_proba(sample)[0]
print(f"Conditional probability distribution P(class | features):")
for cls, p in enumerate(proba):
bar = '█' * int(p * 50)
print(f" Digit {cls}: {p:.4f} {bar}")
print(f"\nPredicted class: {model.predict(sample)[0]} (highest conditional probability)")
Key Takeaways
Summary: Conditional Probability
- — the fundamental formula for updating probabilities given new information
- Conditional probability restricts the sample space — it re-normalizes within event
- Multiplication rule:
- Law of total probability: — sum over all causes
- Base rate fallacy: ignoring leads to wildly incorrect conclusions
- Conditional probability is NOT symmetric: in general
- Conditional independence is weaker than joint independence — and conflating them leads to errors