Bernoulli Distribution
Probability Distributions
The Simplest Random Experiment — Success or Failure
The Bernoulli distribution is the atom of probability: a single yes/no outcome. From this simplicity, entire empires of statistical theory are built.
- Coin flips — heads or tails, each with probability
- Medical tests — positive or negative diagnosis
- Quality control — item passes or fails inspection
- Machine learning — binary classification labels
Every complex random variable can be decomposed into Bernoulli trials. Master this, and you master the foundation.
Core Concepts
The Bernoulli distribution is the canonical example of a discrete random variable and the fundamental building block for all counting processes. Every sequence of independent binary trials — coin flips, yes/no decisions, pass/fail outcomes — is modeled as a collection of Bernoulli random variables.
DfBernoulli Distribution
A random variable follows a Bernoulli distribution with parameter , written , if its probability mass function is:
The outcome is called "success" and is called "failure."
PMF of Bernoulli Distribution
Here,
- =Probability of success (0 ≤ p ≤ 1)
- =Bernoulli random variable
- =Probability of failure
Generating Function
The probability generating function of is
This encodes all moments: and , consistent with for a binary variable.
Mean and Variance: Derivation
Bernoulli Mean and Variance
Here,
- =Probability of success
- =Expected value (mean)
- =Variance
Proof
Mean: .
Variance: Since takes values in , we have , so . Therefore:
Alternative proof via definition:
Maximum Variance
The function satisfies at , and , confirming a global maximum of at . This reflects maximum uncertainty when outcomes are equally likely.
Higher Moments and the Bernoulli Property
ThIdempotence of Bernoulli Variables
For , for all . Consequently, all raw moments equal : for every .
Proof
Since , we have and for any . Thus pointwise, so .
This implies all central moments can be expressed in terms of alone:
Cumulant and Moment Generating Functions
Moment Generating Function
Here,
- =Real parameter
- =Success probability
The cumulant generating function is , and the cumulants are:
The third cumulant (skewness indicator) is positive for and negative for , reflecting the asymmetry of the distribution.
Relationship to Other Distributions
ThBernoulli as Foundation
The Bernoulli distribution is the base case for several important distributions:
(i) Binomial: If , then .
(ii) Geometric: The number of Bernoulli trials until the first success follows .
(iii) Categorical: A generalization to outcomes; the Bernoulli is the case.
Bernoulli Process
An infinite sequence of iid Bernoulli random variables forms a Bernoulli process — the discrete-time analogue of a Poisson process. Key properties:
- The inter-arrival times (trials between successes) are .
- The number of successes in trials is .
- It is the simplest example of a stochastic process.
Worked Example: Drug Efficacy Trial
Example: Clinical Trial
In a clinical trial, each patient either responds to treatment () or does not (). Suppose (70% response rate).
Mean response rate: .
Variance: .
Standard deviation: .
Probability of at least one success in 3 independent patients:
Expected number of successes in 5 patients: .
Variance of number of successes: .
Python Implementation
import numpy as np
from scipy import stats
np.random.seed(42)
# Simulate Bernoulli trials
p = 0.7
n = 10000
samples = np.random.binomial(1, p, size=n)
# Verify mean and variance
print(f"Bernoulli(p={p})")
print(f" Empirical mean: {np.mean(samples):.4f} (theoretical: {p})")
print(f" Empirical variance: {np.var(samples, ddof=0):.4f} (theoretical: {p*(1-p):.4f})")
# Verify X^2 = X property
print(f" E[X^2]: {np.mean(samples**2):.4f} (should equal E[X])")
# Show variance as function of p
p_values = np.linspace(0.01, 0.99, 20)
variances = p_values * (1 - p_values)
print(f"\nMax variance at p=0.5: {0.5 * 0.5:.4f}")
Python Implementation: Bernoulli Process Simulation
import numpy as np
np.random.seed(42)
# Simulate a Bernoulli process: sequence of iid trials
p = 0.4
n_trials = 20
process = np.random.binomial(1, p, size=n_trials)
# Compute inter-arrival times (trials between successes, including the success trial)
success_indices = np.where(process == 1)[0]
inter_arrival = np.diff(np.concatenate([[-1], success_indices]))
print(f"Bernoulli process (p={p}): {process}")
print(f"Success positions: {success_indices}")
print(f"Inter-arrival times: {inter_arrival}")
print(f"Theoretical mean inter-arrival: {1/p:.2f}")
print(f"Empirical mean inter-arrival: {np.mean(inter_arrival):.2f}")
# Number of successes in first 10 vs last 10 trials
print(f"Successes in first 10: {np.sum(process[:10])} (Binomial(10, {p}))")
print(f"Successes in last 10: {np.sum(process[10:])} (Binomial(10, {p}))")
Key Takeaways
Summary: Bernoulli Distribution
- Models a single binary trial: ,
- PMF: for
- Mean: ; Variance:
- Idempotence: , so for all
- MGF:
- Foundation for the Binomial (sum of Bernoulli), Geometric (waiting time), and Bernoulli process
- Variance is maximized at with value