🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Conditional Probability — P(A|B) and Its Applications

Foundations of StatisticsProbability Theory🟢 Free Lesson

Advertisement

Conditional Probability

Probability Theory

When New Information Changes Everything You Thought You Knew

Conditional probability answers: "Given that B happened, what is the probability of A?" It restricts the sample space and recalculates within that reduced world.

  • Sample space reduction — Condition on B and ignore everything outside it
  • Bayesian updating — Use new evidence to revise your beliefs systematically
  • Probability trees — Visualize conditional paths through multi-stage experiments
  • Independence test — A and B are independent if P(A given B) equals P(A)

Conditional probability is the engine of learning from data. It is how evidence changes beliefs.


What is Conditional Probability?

Definition

The conditional probability of event AA given that event BB has occurred is defined as:

P(AB)=P(AB)P(B),P(B)>0P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0

This formula restricts the sample space to BB and recalculates probabilities within that reduced space.

The Sample Space Reduction

When we condition on BB, we are saying: "Ignore everything outside $B$." The event ABA \cap B represents the part of AA that survives this restriction. Dividing by P(B)P(B) normalizes so that P(BB)=1P(B \mid B) = 1 — the conditional probability of BB given itself is certain.


Formal Properties

ThConditional Probability is a Valid Probability Measure

For fixed BB with P(B)>0P(B) > 0, the function P(B)P(\cdot \mid B) satisfies all Kolmogorov axioms:

  1. P(AB)0P(A \mid B) \geq 0 for all AA
  2. P(ΩB)=1P(\Omega \mid B) = 1
  3. P(A1A2B)=P(A1B)+P(A2B)+P(A_1 \cup A_2 \cup \cdots \mid B) = P(A_1 \mid B) + P(A_2 \mid B) + \cdots for disjoint AiA_i

Therefore, all theorems of probability apply to conditional probabilities as well.


The Multiplication Rule

Multiplication Rule

P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A)

Here,

  • P(AB)P(A \cap B)=Joint probability of A and B
  • P(AB)P(B)P(A \mid B) \cdot P(B)=One way to compute it
  • P(BA)P(A)P(B \mid A) \cdot P(A)=Equivalent computation

Law of Total Probability

ThLaw of Total Probability

If B1,B2,,BkB_1, B_2, \ldots, B_k partition the sample space (pairwise disjoint, iBi=Ω\bigcup_i B_i = \Omega), then for any event AA:

P(A)=i=1kP(ABi)=i=1kP(ABi)P(Bi)P(A) = \sum_{i=1}^k P(A \cap B_i) = \sum_{i=1}^k P(A \mid B_i) \, P(B_i)

Conditional Independence

DfConditional Independence

Events AA and BB are conditionally independent given CC if:

P(ABC)=P(AC)P(BC)P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C)

Equivalently: P(AB,C)=P(AC)P(A \mid B, C) = P(A \mid C) — knowing BB (in addition to CC) provides no additional information about AA.

Conditional Independence ≠ Marginal Independence

AA and BB can be marginally independent but conditionally dependent (and vice versa). This is known as Simpson's paradox and has profound implications for causal inference. For example, two medical treatments may appear equally effective overall, but when stratified by patient age, one treatment is clearly superior.


The Base Rate Fallacy

Medical Testing: Why Intuition Fails

Problem: A disease affects 1% of the population. A test is 99% sensitive (true positive rate) and 95% specific (true negative rate). If a patient tests positive, what is the probability they have the disease?

Intuitive (wrong) answer: "The test is 99% accurate, so the probability is about 99%."

Correct calculation using Bayes' theorem:

P(D+)=P(+D)P(D)P(+)=0.99×0.010.99×0.01+0.05×0.99=0.00990.05940.166P(D \mid +) = \frac{P(+ \mid D) \, P(D)}{P(+)} = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx 0.166

Despite the test being 99% accurate, only about 17% of positive results are true positives. The low base rate (1%) dominates.


Worked Example: Two Dice

What is P(sum=7first die=3)P(\text{sum} = 7 \mid \text{first die} = 3)?

The sample space is restricted to outcomes where the first die is 3: {(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)}\{(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)\} — six equally likely outcomes.

Only (3,4)(3,4) gives a sum of 7. Therefore:

P(sum=7first=3)=16P(\text{sum} = 7 \mid \text{first} = 3) = \frac{1}{6}

This equals P(sum=7)=6/36=1/6P(\text{sum} = 7) = 6/36 = 1/6, showing that the first die and the sum are independent (when the sum is 7).


Conditional Probability in Machine Learning

ML ApplicationConditional Prob UsageWhy
ClassificationP(class | features)Core of supervised learning
Naive BayesP(feature_i | class)Text classification
Bayesian networksP(X | parents(X))Causal inference
Weather predictionP(rain | humidity, pressure)Probabilistic forecasting
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Conditional probability IS what classifiers learn
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.3)

model = GaussianNB()
model.fit(X_train, y_train)

# model.predict_proba gives P(class | features)
sample = X_test[:1]
proba = model.predict_proba(sample)[0]
print(f"Conditional probability distribution P(class | features):")
for cls, p in enumerate(proba):
    bar = '█' * int(p * 50)
    print(f"  Digit {cls}: {p:.4f} {bar}")
print(f"\nPredicted class: {model.predict(sample)[0]} (highest conditional probability)")

Key Takeaways

Summary: Conditional Probability

  • P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B) / P(B) — the fundamental formula for updating probabilities given new information
  • Conditional probability restricts the sample space — it re-normalizes within event BB
  • Multiplication rule: P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)
  • Law of total probability: P(A)=iP(ABi)P(Bi)P(A) = \sum_i P(A|B_i)P(B_i) — sum over all causes
  • Base rate fallacy: ignoring P(B)P(B) leads to wildly incorrect conclusions
  • Conditional probability is NOT symmetric: P(AB)P(BA)P(A|B) \neq P(B|A) in general
  • Conditional independence is weaker than joint independence — and conflating them leads to errors

Premium Content

Conditional Probability — P(A|B) and Its Applications

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement