Geometric Distribution

Probability Distributions

How Long Until First Success?

The geometric distribution answers a natural question: how long must we wait for the first success? It is the discrete analogue of the exponential distribution and the only discrete distribution possessing the memoryless property.

Quality control — how many items until first defect
Recruiting — how many interviews until first hire
Sales — how many calls until first sale
Sports — how many games until first win

The geometric distribution is the mathematics of waiting — and its memoryless property makes it unique.

Core Concepts

The geometric distribution answers a natural question: how long must we wait for the first success in a sequence of independent Bernoulli trials? It is the discrete analogue of the exponential distribution and the only discrete distribution possessing the memoryless property.

DfGeometric Distribution (Number of Trials)

A random variable $X$ follows a geometric distribution with parameter $p$ , written $X \sim \text{Geometric}(p)$ , if its PMF is:

P(X = k) = (1-p)^{k-1}\,p, \qquad k = 1, 2, 3, \ldots

Here $X$ counts the trial number on which the first success occurs.

Alternative Parametrization

Some authors define $Y = X - 1$ as the number of failures before the first success, with PMF $P(Y = k) = (1-p)^k p$ for $k = 0, 1, 2, \ldots$ The two conventions differ by a shift of 1. We use the "number of trials" convention throughout.

PMF Derivation and Verification

Why This PMF Is Correct

The event $\{X = k\}$ means the first $k-1$ trials are failures and the $k$ -th trial is a success. By independence:

P(X = k) = \underbrace{(1-p) \cdot (1-p) \cdots (1-p)}_{k-1 \text{ times}} \cdot p = (1-p)^{k-1}p.

Verification that it sums to 1:

\sum_{k=1}^{\infty} (1-p)^{k-1}p = p \sum_{j=0}^{\infty} (1-p)^j = p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1.

(Using the geometric series formula with $|1-p| < 1$ for $0 < p \leq 1$ .)

CDF

CDF of Geometric Distribution

P(X \leq k) = 1 - (1-p)^k

Here,

$k$ =Number of trials
$1-p$ =Failure probability

Derivation

P(X \leq k) = 1 - P(X > k) = 1 - P(\text{first } k \text{ trials all fail}) = 1 - (1-p)^k.

This is elegant: the probability that we haven't succeeded after $k$ trials is $(1-p)^k$ , the complementary cumulative probability.

Mean and Variance: Derivation

Geometric Mean and Variance

E[X] = \frac{1}{p}, \quad \text{Var}(X) = \frac{1-p}{p^2}

Here,

$p$ =Probability of success
$1/p$ =Expected trials until first success

Derivation of the Mean

Method 1: Direct summation

E[X] = \sum_{k=1}^{\infty} k\,(1-p)^{k-1}p = p\sum_{k=1}^{\infty} k\,q^{k-1} = p \cdot \frac{1}{(1-q)^2} = \frac{p}{p^2} = \frac{1}{p},

where $q = 1-p$ and we used $\sum_{k=1}^{\infty} k\,q^{k-1} = \frac{1}{(1-q)^2}$ (derivative of geometric series).

Method 2: Tail sum formula (for non-negative integer-valued $X$ ):

E[X] = \sum_{k=0}^{\infty} P(X > k) = \sum_{k=0}^{\infty} (1-p)^k = \frac{1}{1-(1-p)} = \frac{1}{p}.

Derivation of the Variance

Use $E[X^2] = 2E[X] - E[X(X-1)]$ plus:

E[X(X-1)] = \sum_{k=2}^{\infty} k(k-1)(1-p)^{k-2}p^2 = p^2 \sum_{k=2}^{\infty} k(k-1)q^{k-2} = p^2 \cdot \frac{2}{(1-q)^3} = \frac{2p^2}{p^3} = \frac{2}{p}.

Therefore $E[X^2] = \frac{2}{p} - \frac{1}{p} + \frac{1}{p} = \frac{2-p}{p^2}$ , and:

\text{Var}(X) = E[X^2] - (E[X])^2 = \frac{2-p}{p^2} - \frac{1}{p^2} = \frac{1-p}{p^2}.

The Memoryless Property

ThMemoryless Property

The geometric distribution is the only discrete distribution satisfying the memoryless property: for all non-negative integers $s, t$ ,

P(X > s + t \mid X > s) = P(X > t).

Proof

Forward direction:

P(X > s+t \mid X > s) = \frac{P(X > s+t)}{P(X > s)} = \frac{(1-p)^{s+t}}{(1-p)^s} = (1-p)^t = P(X > t).

Uniqueness: Suppose a non-negative integer-valued $X$ satisfies memorylessness. Let $g(n) = P(X > n)$ . Then $g(s+t)/g(s) = g(t)$ , so $g$ is exponential: $g(n) = c^n$ for some $c \in (0,1)$ . Setting $c = 1-p$ recovers the geometric distribution.

Intuition

If you've already failed $s$ times, the distribution of remaining trials is the same as starting fresh. Each trial is independent, so past failures carry no information about future success. This is the discrete analogue of the exponential distribution's memoryless property in continuous time.

Hazard Function

Hazard Rate

h(k) = P(X = k \mid X \geq k) = p, \quad \text{for all } k \geq 1

Here,

$h(k)$ =Hazard rate at trial k
$p$ =Constant hazard rate

The geometric distribution has a constant hazard rate — the probability of success on any trial, given that we haven't succeeded yet, is always $p$ . This is another manifestation of memorylessness and distinguishes the geometric from distributions with increasing or decreasing hazard rates.

Relationship to Other Distributions

ThDistributional Connections

(i) Sum of Geometrics = Negative Binomial: If $X_1, \ldots, X_r \overset{\text{iid}}{\sim} \text{Geometric}(p)$ , then $\sum_{i=1}^r X_i \sim \text{Negative Binomial}(r, p)$ (counting trials to $r$ -th success).

(ii) Geometric ⊂ Geometric: The geometric is a special case of the negative binomial with $r = 1$ .

(iii) Discrete Analogue of Exponential: The geometric is to the discrete case what the exponential distribution is to the continuous case. Both are memoryless and characterized by constant hazard rates.

(iv) Geometric as sum of Bernoulli indicators: $X = 1 + \sum_{j=1}^{\infty} \prod_{i=1}^j (1 - X_i')$ where $X_i' \sim \text{Bernoulli}(p)$ .

Worked Example: Quality Control

Example: Defective Items on Assembly Line

A machine produces items with defect probability $p = 0.05$ . An inspector checks items sequentially until finding the first defective one.

Expected items inspected: $E[X] = 1/p = 1/0.05 = 20$ .

Variance: $\text{Var}(X) = (1-p)/p^2 = 0.95/0.0025 = 380$ .

Standard deviation: $\sigma = \sqrt{380} \approx 19.49$ .

Probability the first defect appears on or before trial 10:

P(X \leq 10) = 1 - (0.95)^{10} = 1 - 0.5987 \approx 0.4013.

Probability it takes more than 50 trials:

P(X > 50) = (0.95)^{50} \approx 0.0769.

Memoryless property in action: If the inspector has already checked 20 items with no defect found, the probability the next item is defective is still $p = 0.05$ — exactly the same as for a fresh start.

Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Simulate geometric random variables
p = 0.3
n = 10000
samples = np.random.geometric(p, size=n)

# Verify mean and variance
print(f"Geometric(p={p})")
print(f"  Empirical mean:     {np.mean(samples):.4f}  (theoretical: {1/p:.4f})")
print(f"  Empirical variance: {np.var(samples, ddof=0):.4f}  (theoretical: {(1-p)/p**2:.4f})")

# Verify memoryless property
for s in [5, 10, 20]:
    given_gt_s = samples[samples > s]
    empirical = np.mean(given_gt_s > s + 5)  # P(X > s+5 | X > s) ≈ P(X > 5)
    theoretical = (1-p)**5
    print(f"  P(X > {s+5} | X > {s}): empirical={empirical:.4f}, theoretical P(X>5)={theoretical:.4f}")

Python Implementation: Hazard Rate Verification

import numpy as np

np.random.seed(42)

# Verify constant hazard rate for geometric distribution
p = 0.25
n = 50000
samples = np.random.geometric(p, size=n)

print(f"Geometric(p={p}) — Hazard Rate Verification")
print(f"{'k':>4} {'P(X=k | X>=k)':>14} {'p (theoretical)':>16}")
print("-" * 36)

for k in [1, 2, 3, 5, 10, 20]:
    given_ge_k = samples[samples >= k]
    if len(given_ge_k) > 0:
        hazard = np.mean(given_ge_k == k)
        print(f"{k:>4} {hazard:>14.4f} {p:>16.4f}")

# Show that geometric inter-arrival times in Bernoulli process are geometric
print(f"\nBernoulli process inter-arrival times:")
bernoulli = np.random.binomial(1, p, size=10000)
successes = np.where(bernoulli == 1)[0]
inter_arrival = np.diff(np.concatenate([[-1], successes]))
print(f"  Mean inter-arrival: {np.mean(inter_arrival):.4f} (theoretical: {1/p:.4f})")

Key Takeaways

Summary: Geometric Distribution

Counts trials until first success: $P(X=k) = (1-p)^{k-1}p$ for $k \geq 1$
Mean: $E[X] = 1/p$ ; Variance: $\text{Var}(X) = (1-p)/p^2$
Memoryless property: $P(X > s+t \mid X > s) = P(X > t)$ — the only discrete distribution with this property
Constant hazard rate: $P(X = k \mid X \geq k) = p$ for all $k$
CDF: $P(X \leq k) = 1 - (1-p)^k$
Foundation for the Negative Binomial (sum of $r$ independent geometrics)
Discrete analogue of the exponential distribution

Geometric Distribution — Waiting Time for First Success

Geometric Distribution

How Long Until First Success?

Core Concepts

DfGeometric Distribution (Number of Trials)

PMF Derivation and Verification

CDF

CDF of Geometric Distribution

Mean and Variance: Derivation

Geometric Mean and Variance

The Memoryless Property

ThMemoryless Property

Hazard Function

Hazard Rate

Relationship to Other Distributions

ThDistributional Connections

Worked Example: Quality Control

Python Implementation

Python Implementation: Hazard Rate Verification

Key Takeaways

Summary: Geometric Distribution

Premium Content

Need Expert Statistics Help?