🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Bernoulli Distribution — Binary Outcomes

Foundations of StatisticsProbability Distributions🟢 Free Lesson

Advertisement

Bernoulli Distribution

Probability Distributions

The Simplest Random Experiment — Success or Failure

The Bernoulli distribution is the atom of probability: a single yes/no outcome. From this simplicity, entire empires of statistical theory are built.

  • Coin flips — heads or tails, each with probability pp
  • Medical tests — positive or negative diagnosis
  • Quality control — item passes or fails inspection
  • Machine learning — binary classification labels

Every complex random variable can be decomposed into Bernoulli trials. Master this, and you master the foundation.


Core Concepts

The Bernoulli distribution is the canonical example of a discrete random variable and the fundamental building block for all counting processes. Every sequence of independent binary trials — coin flips, yes/no decisions, pass/fail outcomes — is modeled as a collection of Bernoulli random variables.

DfBernoulli Distribution

A random variable XX follows a Bernoulli distribution with parameter p[0,1]p \in [0,1], written XBernoulli(p)X \sim \text{Bernoulli}(p), if its probability mass function is:

P(X=1)=p,P(X=0)=1p.P(X = 1) = p, \qquad P(X = 0) = 1 - p.

The outcome X=1X=1 is called "success" and X=0X=0 is called "failure."

PMF of Bernoulli Distribution

P(X=x)=px(1p)1x,x{0,1}P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\}

Here,

  • pp=Probability of success (0 ≤ p ≤ 1)
  • XX=Bernoulli random variable
  • 1p1-p=Probability of failure

Generating Function

The probability generating function of XBernoulli(p)X \sim \text{Bernoulli}(p) is

GX(s)=E[sX]=(1p)+ps=1p+ps.G_X(s) = E[s^X] = (1-p) + ps = 1 - p + ps.

This encodes all moments: E[X]=GX(1)=pE[X] = G_X'(1) = p and E[X(X1)]=GX(1)=0E[X(X-1)] = G_X''(1) = 0, consistent with X2=XX^2 = X for a binary variable.


Mean and Variance: Derivation

Bernoulli Mean and Variance

E[X]=p,Var(X)=p(1p)E[X] = p, \quad \text{Var}(X) = p(1-p)

Here,

  • pp=Probability of success
  • E[X]E[X]=Expected value (mean)
  • Var(X)\text{Var}(X)=Variance

Proof

Mean: E[X]=1P(X=1)+0P(X=0)=pE[X] = 1 \cdot P(X=1) + 0 \cdot P(X=0) = p.

Variance: Since XX takes values in {0,1}\{0,1\}, we have X2=XX^2 = X, so E[X2]=E[X]=pE[X^2] = E[X] = p. Therefore:

Var(X)=E[X2](E[X])2=pp2=p(1p).\text{Var}(X) = E[X^2] - (E[X])^2 = p - p^2 = p(1-p).

Alternative proof via definition:

Var(X)=E[(Xp)2]=(1p)2p+(0p)2(1p)=p(1p)[(1p)+p]=p(1p).\text{Var}(X) = E[(X-p)^2] = (1-p)^2 \cdot p + (0-p)^2 \cdot (1-p) = p(1-p)\bigl[(1-p) + p\bigr] = p(1-p).

Maximum Variance

The function g(p)=p(1p)g(p) = p(1-p) satisfies g(p)=12p=0g'(p) = 1 - 2p = 0 at p=1/2p = 1/2, and g(p)=2<0g''(p) = -2 < 0, confirming a global maximum of Var(X)=1/4\text{Var}(X) = 1/4 at p=1/2p = 1/2. This reflects maximum uncertainty when outcomes are equally likely.


Higher Moments and the Bernoulli Property X2=XX^2 = X

ThIdempotence of Bernoulli Variables

For XBernoulli(p)X \sim \text{Bernoulli}(p), Xn=XX^n = X for all n1n \geq 1. Consequently, all raw moments equal pp: E[Xn]=pE[X^n] = p for every n1n \geq 1.

Proof

Since X{0,1}X \in \{0, 1\}, we have 0n=00^n = 0 and 1n=11^n = 1 for any n1n \geq 1. Thus Xn=XX^n = X pointwise, so E[Xn]=E[X]=pE[X^n] = E[X] = p.

This implies all central moments μn=E[(Xp)n]\mu_n = E[(X-p)^n] can be expressed in terms of pp alone:

μ2=p(1p),μ3=p(1p)(12p),μ4=p(1p)(13p+3p2).\mu_2 = p(1-p), \quad \mu_3 = p(1-p)(1-2p), \quad \mu_4 = p(1-p)(1 - 3p + 3p^2).

Cumulant and Moment Generating Functions

Moment Generating Function

MX(t)=E[etX]=(1p)+petM_X(t) = E[e^{tX}] = (1-p) + pe^t

Here,

  • tt=Real parameter
  • pp=Success probability

The cumulant generating function is KX(t)=logMX(t)=log(1p+pet)K_X(t) = \log M_X(t) = \log(1-p+pe^t), and the cumulants are:

κ1=p,κ2=p(1p),κ3=p(1p)(12p).\kappa_1 = p, \qquad \kappa_2 = p(1-p), \qquad \kappa_3 = p(1-p)(1-2p).

The third cumulant (skewness indicator) is positive for p<1/2p < 1/2 and negative for p>1/2p > 1/2, reflecting the asymmetry of the distribution.


Relationship to Other Distributions

ThBernoulli as Foundation

The Bernoulli distribution is the base case for several important distributions:

(i) Binomial: If X1,,XniidBernoulli(p)X_1, \ldots, X_n \overset{\text{iid}}{\sim} \text{Bernoulli}(p), then Y=i=1nXiBinomial(n,p)Y = \sum_{i=1}^n X_i \sim \text{Binomial}(n, p).

(ii) Geometric: The number of Bernoulli trials until the first success follows Geometric(p)\text{Geometric}(p).

(iii) Categorical: A generalization to kk outcomes; the Bernoulli is the k=2k=2 case.

Bernoulli Process

An infinite sequence (X1,X2,)(X_1, X_2, \ldots) of iid Bernoulli(p)(p) random variables forms a Bernoulli process — the discrete-time analogue of a Poisson process. Key properties:

  • The inter-arrival times (trials between successes) are Geometric(p)\text{Geometric}(p).
  • The number of successes in nn trials is Binomial(n,p)\text{Binomial}(n, p).
  • It is the simplest example of a stochastic process.

Worked Example: Drug Efficacy Trial

Example: Clinical Trial

In a clinical trial, each patient either responds to treatment (X=1X=1) or does not (X=0X=0). Suppose p=0.7p = 0.7 (70% response rate).

Mean response rate: E[X]=0.7E[X] = 0.7.

Variance: Var(X)=0.7×0.3=0.21\text{Var}(X) = 0.7 \times 0.3 = 0.21.

Standard deviation: σ=0.210.458\sigma = \sqrt{0.21} \approx 0.458.

Probability of at least one success in 3 independent patients:

P(at least one success)=1P(all failures)=1(0.3)3=10.027=0.973.P(\text{at least one success}) = 1 - P(\text{all failures}) = 1 - (0.3)^3 = 1 - 0.027 = 0.973.

Expected number of successes in 5 patients: E[Binomial(5,0.7)]=5×0.7=3.5E[\text{Binomial}(5, 0.7)] = 5 \times 0.7 = 3.5.

Variance of number of successes: Var(Binomial(5,0.7))=5×0.21=1.05\text{Var}(\text{Binomial}(5, 0.7)) = 5 \times 0.21 = 1.05.


Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Simulate Bernoulli trials
p = 0.7
n = 10000
samples = np.random.binomial(1, p, size=n)

# Verify mean and variance
print(f"Bernoulli(p={p})")
print(f"  Empirical mean:     {np.mean(samples):.4f}  (theoretical: {p})")
print(f"  Empirical variance: {np.var(samples, ddof=0):.4f}  (theoretical: {p*(1-p):.4f})")

# Verify X^2 = X property
print(f"  E[X^2]:             {np.mean(samples**2):.4f}  (should equal E[X])")

# Show variance as function of p
p_values = np.linspace(0.01, 0.99, 20)
variances = p_values * (1 - p_values)
print(f"\nMax variance at p=0.5: {0.5 * 0.5:.4f}")

Python Implementation: Bernoulli Process Simulation

import numpy as np

np.random.seed(42)

# Simulate a Bernoulli process: sequence of iid trials
p = 0.4
n_trials = 20
process = np.random.binomial(1, p, size=n_trials)

# Compute inter-arrival times (trials between successes, including the success trial)
success_indices = np.where(process == 1)[0]
inter_arrival = np.diff(np.concatenate([[-1], success_indices]))
print(f"Bernoulli process (p={p}): {process}")
print(f"Success positions: {success_indices}")
print(f"Inter-arrival times: {inter_arrival}")
print(f"Theoretical mean inter-arrival: {1/p:.2f}")
print(f"Empirical mean inter-arrival:   {np.mean(inter_arrival):.2f}")

# Number of successes in first 10 vs last 10 trials
print(f"Successes in first 10:  {np.sum(process[:10])} (Binomial(10, {p}))")
print(f"Successes in last 10:   {np.sum(process[10:])} (Binomial(10, {p}))")

Key Takeaways

Summary: Bernoulli Distribution

  • Models a single binary trial: P(X=1)=pP(X=1) = p, P(X=0)=1pP(X=0) = 1-p
  • PMF: P(X=x)=px(1p)1xP(X=x) = p^x(1-p)^{1-x} for x{0,1}x \in \{0,1\}
  • Mean: E[X]=pE[X] = p; Variance: Var(X)=p(1p)\text{Var}(X) = p(1-p)
  • Idempotence: X2=XX^2 = X, so E[Xn]=pE[X^n] = p for all n1n \geq 1
  • MGF: MX(t)=(1p)+petM_X(t) = (1-p) + pe^t
  • Foundation for the Binomial (sum of nn Bernoulli), Geometric (waiting time), and Bernoulli process
  • Variance is maximized at p=1/2p = 1/2 with value 1/41/4

Premium Content

Bernoulli Distribution — Binary Outcomes

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement