Bernoulli Distribution

Probability Distributions

The Simplest Random Experiment — Success or Failure

The Bernoulli distribution is the atom of probability: a single yes/no outcome. From this simplicity, entire empires of statistical theory are built.

Coin flips — heads or tails, each with probability $p$
Medical tests — positive or negative diagnosis
Quality control — item passes or fails inspection
Machine learning — binary classification labels

Every complex random variable can be decomposed into Bernoulli trials. Master this, and you master the foundation.

Core Concepts

The Bernoulli distribution is the canonical example of a discrete random variable and the fundamental building block for all counting processes. Every sequence of independent binary trials — coin flips, yes/no decisions, pass/fail outcomes — is modeled as a collection of Bernoulli random variables.

DfBernoulli Distribution

A random variable $X$ follows a Bernoulli distribution with parameter $p \in [0,1]$ , written $X \sim \text{Bernoulli}(p)$ , if its probability mass function is:

P(X = 1) = p, \qquad P(X = 0) = 1 - p.

The outcome $X=1$ is called "success" and $X=0$ is called "failure."

PMF of Bernoulli Distribution

P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\}

Here,

$p$ =Probability of success (0 ≤ p ≤ 1)
$X$ =Bernoulli random variable
$1-p$ =Probability of failure

Generating Function

The probability generating function of $X \sim \text{Bernoulli}(p)$ is

G_X(s) = E[s^X] = (1-p) + ps = 1 - p + ps.

This encodes all moments: $E[X] = G_X'(1) = p$ and $E[X(X-1)] = G_X''(1) = 0$ , consistent with $X^2 = X$ for a binary variable.

Mean and Variance: Derivation

Bernoulli Mean and Variance

E[X] = p, \quad \text{Var}(X) = p(1-p)

Here,

$p$ =Probability of success
$E[X]$ =Expected value (mean)
$\text{Var}(X)$ =Variance

Proof

Mean: $E[X] = 1 \cdot P(X=1) + 0 \cdot P(X=0) = p$ .

Variance: Since $X$ takes values in $\{0,1\}$ , we have $X^2 = X$ , so $E[X^2] = E[X] = p$ . Therefore:

\text{Var}(X) = E[X^2] - (E[X])^2 = p - p^2 = p(1-p).

Alternative proof via definition:

\text{Var}(X) = E[(X-p)^2] = (1-p)^2 \cdot p + (0-p)^2 \cdot (1-p) = p(1-p)\bigl[(1-p) + p\bigr] = p(1-p).

Maximum Variance

The function $g(p) = p(1-p)$ satisfies $g'(p) = 1 - 2p = 0$ at $p = 1/2$ , and $g''(p) = -2 < 0$ , confirming a global maximum of $\text{Var}(X) = 1/4$ at $p = 1/2$ . This reflects maximum uncertainty when outcomes are equally likely.

Higher Moments and the Bernoulli Property $X^2 = X$

ThIdempotence of Bernoulli Variables

For $X \sim \text{Bernoulli}(p)$ , $X^n = X$ for all $n \geq 1$ . Consequently, all raw moments equal $p$ : $E[X^n] = p$ for every $n \geq 1$ .

Proof

Since $X \in \{0, 1\}$ , we have $0^n = 0$ and $1^n = 1$ for any $n \geq 1$ . Thus $X^n = X$ pointwise, so $E[X^n] = E[X] = p$ .

This implies all central moments $\mu_n = E[(X-p)^n]$ can be expressed in terms of $p$ alone:

\mu_2 = p(1-p), \quad \mu_3 = p(1-p)(1-2p), \quad \mu_4 = p(1-p)(1 - 3p + 3p^2).

Cumulant and Moment Generating Functions

Moment Generating Function

M_X(t) = E[e^{tX}] = (1-p) + pe^t

Here,

$t$ =Real parameter
$p$ =Success probability

The cumulant generating function is $K_X(t) = \log M_X(t) = \log(1-p+pe^t)$ , and the cumulants are:

\kappa_1 = p, \qquad \kappa_2 = p(1-p), \qquad \kappa_3 = p(1-p)(1-2p).

The third cumulant (skewness indicator) is positive for $p < 1/2$ and negative for $p > 1/2$ , reflecting the asymmetry of the distribution.

Relationship to Other Distributions

ThBernoulli as Foundation

The Bernoulli distribution is the base case for several important distributions:

(i) Binomial: If $X_1, \ldots, X_n \overset{\text{iid}}{\sim} \text{Bernoulli}(p)$ , then $Y = \sum_{i=1}^n X_i \sim \text{Binomial}(n, p)$ .

(ii) Geometric: The number of Bernoulli trials until the first success follows $\text{Geometric}(p)$ .

(iii) Categorical: A generalization to $k$ outcomes; the Bernoulli is the $k=2$ case.

Bernoulli Process

An infinite sequence $(X_1, X_2, \ldots)$ of iid Bernoulli $(p)$ random variables forms a Bernoulli process — the discrete-time analogue of a Poisson process. Key properties:

The inter-arrival times (trials between successes) are $\text{Geometric}(p)$ .
The number of successes in $n$ trials is $\text{Binomial}(n, p)$ .
It is the simplest example of a stochastic process.

Worked Example: Drug Efficacy Trial

Example: Clinical Trial

In a clinical trial, each patient either responds to treatment ( $X=1$ ) or does not ( $X=0$ ). Suppose $p = 0.7$ (70% response rate).

Mean response rate: $E[X] = 0.7$ .

Variance: $\text{Var}(X) = 0.7 \times 0.3 = 0.21$ .

Standard deviation: $\sigma = \sqrt{0.21} \approx 0.458$ .

Probability of at least one success in 3 independent patients:

P(\text{at least one success}) = 1 - P(\text{all failures}) = 1 - (0.3)^3 = 1 - 0.027 = 0.973.

Expected number of successes in 5 patients: $E[\text{Binomial}(5, 0.7)] = 5 \times 0.7 = 3.5$ .

Variance of number of successes: $\text{Var}(\text{Binomial}(5, 0.7)) = 5 \times 0.21 = 1.05$ .

Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Simulate Bernoulli trials
p = 0.7
n = 10000
samples = np.random.binomial(1, p, size=n)

# Verify mean and variance
print(f"Bernoulli(p={p})")
print(f"  Empirical mean:     {np.mean(samples):.4f}  (theoretical: {p})")
print(f"  Empirical variance: {np.var(samples, ddof=0):.4f}  (theoretical: {p*(1-p):.4f})")

# Verify X^2 = X property
print(f"  E[X^2]:             {np.mean(samples**2):.4f}  (should equal E[X])")

# Show variance as function of p
p_values = np.linspace(0.01, 0.99, 20)
variances = p_values * (1 - p_values)
print(f"\nMax variance at p=0.5: {0.5 * 0.5:.4f}")

Python Implementation: Bernoulli Process Simulation

import numpy as np

np.random.seed(42)

# Simulate a Bernoulli process: sequence of iid trials
p = 0.4
n_trials = 20
process = np.random.binomial(1, p, size=n_trials)

# Compute inter-arrival times (trials between successes, including the success trial)
success_indices = np.where(process == 1)[0]
inter_arrival = np.diff(np.concatenate([[-1], success_indices]))
print(f"Bernoulli process (p={p}): {process}")
print(f"Success positions: {success_indices}")
print(f"Inter-arrival times: {inter_arrival}")
print(f"Theoretical mean inter-arrival: {1/p:.2f}")
print(f"Empirical mean inter-arrival:   {np.mean(inter_arrival):.2f}")

# Number of successes in first 10 vs last 10 trials
print(f"Successes in first 10:  {np.sum(process[:10])} (Binomial(10, {p}))")
print(f"Successes in last 10:   {np.sum(process[10:])} (Binomial(10, {p}))")

Key Takeaways

Summary: Bernoulli Distribution

Models a single binary trial: $P(X=1) = p$ , $P(X=0) = 1-p$
PMF: $P(X=x) = p^x(1-p)^{1-x}$ for $x \in \{0,1\}$
Mean: $E[X] = p$ ; Variance: $\text{Var}(X) = p(1-p)$
Idempotence: $X^2 = X$ , so $E[X^n] = p$ for all $n \geq 1$
MGF: $M_X(t) = (1-p) + pe^t$
Foundation for the Binomial (sum of $n$ Bernoulli), Geometric (waiting time), and Bernoulli process
Variance is maximized at $p = 1/2$ with value $1/4$

Bernoulli Distribution — Binary Outcomes

Bernoulli Distribution

The Simplest Random Experiment — Success or Failure

Core Concepts

DfBernoulli Distribution

PMF of Bernoulli Distribution

Mean and Variance: Derivation

Bernoulli Mean and Variance

Higher Moments and the Bernoulli Property $X^2 = X$

ThIdempotence of Bernoulli Variables

Cumulant and Moment Generating Functions

Moment Generating Function

Relationship to Other Distributions

ThBernoulli as Foundation

Worked Example: Drug Efficacy Trial

Python Implementation

Python Implementation: Bernoulli Process Simulation

Key Takeaways

Summary: Bernoulli Distribution

Premium Content

Need Expert Statistics Help?

Bernoulli Distribution — Binary Outcomes

Bernoulli Distribution

The Simplest Random Experiment — Success or Failure

Core Concepts

DfBernoulli Distribution

PMF of Bernoulli Distribution

Mean and Variance: Derivation

Bernoulli Mean and Variance

Higher Moments and the Bernoulli Property X2=XX^2 = XX2=X

ThIdempotence of Bernoulli Variables

Cumulant and Moment Generating Functions

Moment Generating Function

Relationship to Other Distributions

ThBernoulli as Foundation

Worked Example: Drug Efficacy Trial

Python Implementation

Python Implementation: Bernoulli Process Simulation

Key Takeaways

Summary: Bernoulli Distribution

Premium Content

Need Expert Statistics Help?

Higher Moments and the Bernoulli Property $X^2 = X$