🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Introduction to Probability — Foundations and Definitions

Foundations of StatisticsProbability Theory🟢 Free Lesson

Advertisement

Introduction to Probability

Probability Theory

The Mathematics of Uncertainty

Probability is the mathematical framework for quantifying uncertainty. It assigns a number between 0 and 1 to an event, measuring how likely that event is to occur.

  • Classical interpretation — Equal likelihood of outcomes; the fair coin, the rolled die
  • Frequentist interpretation — Long-run relative frequency from infinite repetitions
  • Subjective interpretation — Personal degree of belief updated by evidence
  • Kolmogorov axioms — The three mathematical rules that make probability work

Probability is not about certainty — it is about quantifying what we do not know.


What is Probability?

Definition

Probability is the mathematical framework for quantifying uncertainty. It assigns a number between 0 and 1 to an event, measuring how likely that event is to occur. A probability of 0 means the event is impossible; a probability of 1 means it is certain.

The foundations of modern probability were laid by Kolmogorov (1933), who axiomatized probability as a measure on a set of outcomes.


Three Interpretations of Probability

InterpretationDefinitionFormalizationExample
ClassicalEqual likelihood of outcomesP(A)=n(A)n(S)P(A) = \frac{n(A)}{n(S)}Fair coin: P(H)=1/2P(H) = 1/2
FrequentistLong-run relative frequencyP(A)=limnnAnP(A) = \lim_{n \to \infty} \frac{n_A}{n}48 heads in 100 tosses -> P^(H)0.48\hat{P}(H) \approx 0.48
SubjectivePersonal degree of beliefBayesian: P(A)[0,1]P(A) \in [0,1] encodes belief"I'm 80% sure it will rain"

The Frequentist Interpretation

The frequentist interpretation defines probability as the limit of relative frequency in an infinite sequence of i.i.d. repetitions. This is formalized by the Strong Law of Large Numbers: if AA occurs with probability P(A)P(A) in each trial, then nAnP(A)\frac{n_A}{n} \to P(A) almost surely as nn \to \infty.


The Axioms of Probability

ThKolmogorov Axioms (1933)

Let Ω\Omega be the sample space (set of all possible outcomes) and F\mathcal{F} a σ\sigma-algebra of events. A probability measure P:F[0,1]P: \mathcal{F} \to [0,1] satisfies:

  1. Non-negativity: P(A)0P(A) \geq 0 for every event AFA \in \mathcal{F}.
  2. Normalization: P(Ω)=1P(\Omega) = 1.
  3. Countable additivity: If A1,A2,A_1, A_2, \ldots are pairwise disjoint events, then:
P(i=1Ai)=i=1P(Ai)P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)

Finite Additivity vs Countable Additivity

Kolmogorov's third axiom requires countable (not just finite) additivity. This is necessary for rigorous measure-theoretic probability, particularly when dealing with infinite sample spaces (e.g., the uniform distribution on [0,1][0,1]).

Immediate Consequences of the Axioms

ThBasic Properties from Kolmogorov Axioms

From the three axioms, we can derive:

  1. P()=0P(\emptyset) = 0 (the empty event has probability zero)
  2. P(Ac)=1P(A)P(A^c) = 1 - P(A) (complement rule)
  3. P(A)1P(A) \leq 1 (upper bound)
  4. If ABA \subseteq B, then P(A)P(B)P(A) \leq P(B) (monotonicity)
  5. P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B) (inclusion-exclusion)
  6. Boole's inequality: P(i=1nAi)i=1nP(Ai)P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i)

The Addition Rule

Addition Rule for Two Events

P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

Here,

  • ABA \cup B=Event A or B (or both)
  • ABA \cap B=Event A and B
  • P(AB)P(A \cap B)=Joint probability

For mutually exclusive events (AB=A \cap B = \emptyset):

P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)

Conditional Probability and Independence

DfConditional Probability

For events AA and BB with P(B)>0P(B) > 0, the conditional probability of AA given BB is:

P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}

This defines a valid probability measure on Ω\Omega for fixed BB.

DfIndependence

Events AA and BB are independent if and only if:

P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)

Equivalently: P(AB)=P(A)P(A \mid B) = P(A) (knowing BB occurred does not change the probability of AA).

Mutual Exclusivity vs Independence

Mutually exclusive (AB=A \cap B = \emptyset) and independent (P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)) are not the same concept. In fact, if P(A)>0P(A) > 0 and P(B)>0P(B) > 0, then mutual exclusivity implies dependence (since P(AB)=0P(A)P(B)P(A \cap B) = 0 \neq P(A)P(B)).


The Multiplication Rule

Multiplication Rule

P(AB)=P(A)P(BA)=P(B)P(AB)P(A \cap B) = P(A) \cdot P(B \mid A) = P(B) \cdot P(A \mid B)

Here,

  • P(AB)P(A \cap B)=Joint probability of A and B
  • P(BA)P(B \mid A)=Conditional probability of B given A

Total Probability and Bayes' Theorem

ThLaw of Total Probability

If B1,B2,,BkB_1, B_2, \ldots, B_k partition the sample space (pairwise disjoint, union = Ω\Omega), then for any event AA:

P(A)=i=1kP(ABi)=i=1kP(ABi)P(Bi)P(A) = \sum_{i=1}^k P(A \cap B_i) = \sum_{i=1}^k P(A \mid B_i) P(B_i)

ThBayes' Theorem

P(BiA)=P(ABi)P(Bi)j=1kP(ABj)P(Bj)P(B_i \mid A) = \frac{P(A \mid B_i) \, P(B_i)}{\sum_{j=1}^k P(A \mid B_j) \, P(B_j)}

or equivalently:

P(BiA)=P(ABi)P(Bi)P(A)P(B_i \mid A) = \frac{P(A \mid B_i) \, P(B_i)}{P(A)}

Bayes' theorem is the foundation of Bayesian statistics: it updates prior beliefs P(Bi)P(B_i) in light of observed data AA to produce posterior beliefs P(BiA)P(B_i \mid A).


Counting Principles

ThFundamental Counting Principle

If task 1 can be done in n1n_1 ways, task 2 in n2n_2 ways, ..., task kk in nkn_k ways, then the total number of ways to perform all tasks is n1×n2××nkn_1 \times n_2 \times \cdots \times n_k.

Permutations and Combinations

Permutations: P(n,k)=n!(nk)!Combinations: (nk)=n!k!(nk)!\text{Permutations: } P(n,k) = \frac{n!}{(n-k)!} \qquad \text{Combinations: } \binom{n}{k} = \frac{n!}{k!(n-k)!}

Here,

  • P(n,k)P(n,k)=Number of ordered arrangements of k items from n
  • (nk)\binom{n}{k}=Number of unordered selections of k items from n
  • n!n!=n factorial: n × (n−1) × ⋯ × 1

Probability in Machine Learning

ML ApplicationProbability UsageWhy
ClassificationP(class | features)Core of supervised learning
Naive BayesP(feature | class) × P(class)Text classification baseline
Bayesian optimizationP(optimal params | data)Hyperparameter tuning
Uncertainty estimationConfidence intervalsTrustworthy predictions
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)

# Naive Bayes: applies Bayes' theorem directly
model = GaussianNB()
model.fit(X_train, y_train)
proba = model.predict_proba(X_test[:3])

print("Naive Bayes predictions (probability):")
for i, p in enumerate(proba):
    print(f"  Sample {i}: {p.round(3)} → class {np.argmax(p)}")
print(f"Accuracy: {model.score(X_test, y_test):.3f}")
print("ML IS applied probability theory!")

Key Takeaways

Summary: Probability Foundations

  • Probability quantifies uncertainty — ranges from 0 (impossible) to 1 (certain)
  • Three interpretations: classical (equally likely), frequentist (long-run frequency), subjective (belief)
  • Kolmogorov axioms form the mathematical foundation: non-negativity, normalization, countable additivity
  • Conditional probability is defined as P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B)/P(B) — the basis for all inference
  • Independence means P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B) — distinct from mutual exclusivity
  • Bayes' theorem updates prior beliefs given observed data — the engine of Bayesian inference
  • Counting principles (permutations, combinations) enable computation of probabilities in finite sample spaces

Premium Content

Introduction to Probability — Foundations and Definitions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement