🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Geometric Distribution — Waiting Time for First Success

Foundations of StatisticsProbability Distributions🟢 Free Lesson

Advertisement

Geometric Distribution

Probability Distributions

How Long Until First Success?

The geometric distribution answers a natural question: how long must we wait for the first success? It is the discrete analogue of the exponential distribution and the only discrete distribution possessing the memoryless property.

  • Quality control — how many items until first defect
  • Recruiting — how many interviews until first hire
  • Sales — how many calls until first sale
  • Sports — how many games until first win

The geometric distribution is the mathematics of waiting — and its memoryless property makes it unique.


Core Concepts

The geometric distribution answers a natural question: how long must we wait for the first success in a sequence of independent Bernoulli trials? It is the discrete analogue of the exponential distribution and the only discrete distribution possessing the memoryless property.

DfGeometric Distribution (Number of Trials)

A random variable XX follows a geometric distribution with parameter pp, written XGeometric(p)X \sim \text{Geometric}(p), if its PMF is:

P(X=k)=(1p)k1p,k=1,2,3,P(X = k) = (1-p)^{k-1}\,p, \qquad k = 1, 2, 3, \ldots

Here XX counts the trial number on which the first success occurs.

Alternative Parametrization

Some authors define Y=X1Y = X - 1 as the number of failures before the first success, with PMF P(Y=k)=(1p)kpP(Y = k) = (1-p)^k p for k=0,1,2,k = 0, 1, 2, \ldots The two conventions differ by a shift of 1. We use the "number of trials" convention throughout.


PMF Derivation and Verification

Why This PMF Is Correct

The event {X=k}\{X = k\} means the first k1k-1 trials are failures and the kk-th trial is a success. By independence:

P(X=k)=(1p)(1p)(1p)k1 timesp=(1p)k1p.P(X = k) = \underbrace{(1-p) \cdot (1-p) \cdots (1-p)}_{k-1 \text{ times}} \cdot p = (1-p)^{k-1}p.

Verification that it sums to 1:

k=1(1p)k1p=pj=0(1p)j=p11(1p)=p1p=1.\sum_{k=1}^{\infty} (1-p)^{k-1}p = p \sum_{j=0}^{\infty} (1-p)^j = p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1.

(Using the geometric series formula with 1p<1|1-p| < 1 for 0<p10 < p \leq 1.)


CDF

CDF of Geometric Distribution

P(Xk)=1(1p)kP(X \leq k) = 1 - (1-p)^k

Here,

  • kk=Number of trials
  • 1p1-p=Failure probability

Derivation

P(Xk)=1P(X>k)=1P(first k trials all fail)=1(1p)k.P(X \leq k) = 1 - P(X > k) = 1 - P(\text{first } k \text{ trials all fail}) = 1 - (1-p)^k.

This is elegant: the probability that we haven't succeeded after kk trials is (1p)k(1-p)^k, the complementary cumulative probability.


Mean and Variance: Derivation

Geometric Mean and Variance

E[X]=1p,Var(X)=1pp2E[X] = \frac{1}{p}, \quad \text{Var}(X) = \frac{1-p}{p^2}

Here,

  • pp=Probability of success
  • 1/p1/p=Expected trials until first success

Derivation of the Mean

Method 1: Direct summation

E[X]=k=1k(1p)k1p=pk=1kqk1=p1(1q)2=pp2=1p,E[X] = \sum_{k=1}^{\infty} k\,(1-p)^{k-1}p = p\sum_{k=1}^{\infty} k\,q^{k-1} = p \cdot \frac{1}{(1-q)^2} = \frac{p}{p^2} = \frac{1}{p},

where q=1pq = 1-p and we used k=1kqk1=1(1q)2\sum_{k=1}^{\infty} k\,q^{k-1} = \frac{1}{(1-q)^2} (derivative of geometric series).

Method 2: Tail sum formula (for non-negative integer-valued XX):

E[X]=k=0P(X>k)=k=0(1p)k=11(1p)=1p.E[X] = \sum_{k=0}^{\infty} P(X > k) = \sum_{k=0}^{\infty} (1-p)^k = \frac{1}{1-(1-p)} = \frac{1}{p}.

Derivation of the Variance

Use E[X2]=2E[X]E[X(X1)]E[X^2] = 2E[X] - E[X(X-1)] plus:

E[X(X1)]=k=2k(k1)(1p)k2p2=p2k=2k(k1)qk2=p22(1q)3=2p2p3=2p.E[X(X-1)] = \sum_{k=2}^{\infty} k(k-1)(1-p)^{k-2}p^2 = p^2 \sum_{k=2}^{\infty} k(k-1)q^{k-2} = p^2 \cdot \frac{2}{(1-q)^3} = \frac{2p^2}{p^3} = \frac{2}{p}.

Therefore E[X2]=2p1p+1p=2pp2E[X^2] = \frac{2}{p} - \frac{1}{p} + \frac{1}{p} = \frac{2-p}{p^2}, and:

Var(X)=E[X2](E[X])2=2pp21p2=1pp2.\text{Var}(X) = E[X^2] - (E[X])^2 = \frac{2-p}{p^2} - \frac{1}{p^2} = \frac{1-p}{p^2}.

The Memoryless Property

ThMemoryless Property

The geometric distribution is the only discrete distribution satisfying the memoryless property: for all non-negative integers s,ts, t,

P(X>s+tX>s)=P(X>t).P(X > s + t \mid X > s) = P(X > t).

Proof

Forward direction:

P(X>s+tX>s)=P(X>s+t)P(X>s)=(1p)s+t(1p)s=(1p)t=P(X>t).P(X > s+t \mid X > s) = \frac{P(X > s+t)}{P(X > s)} = \frac{(1-p)^{s+t}}{(1-p)^s} = (1-p)^t = P(X > t).

Uniqueness: Suppose a non-negative integer-valued XX satisfies memorylessness. Let g(n)=P(X>n)g(n) = P(X > n). Then g(s+t)/g(s)=g(t)g(s+t)/g(s) = g(t), so gg is exponential: g(n)=cng(n) = c^n for some c(0,1)c \in (0,1). Setting c=1pc = 1-p recovers the geometric distribution.

Intuition

If you've already failed ss times, the distribution of remaining trials is the same as starting fresh. Each trial is independent, so past failures carry no information about future success. This is the discrete analogue of the exponential distribution's memoryless property in continuous time.


Hazard Function

Hazard Rate

h(k)=P(X=kXk)=p,for all k1h(k) = P(X = k \mid X \geq k) = p, \quad \text{for all } k \geq 1

Here,

  • h(k)h(k)=Hazard rate at trial k
  • pp=Constant hazard rate

The geometric distribution has a constant hazard rate — the probability of success on any trial, given that we haven't succeeded yet, is always pp. This is another manifestation of memorylessness and distinguishes the geometric from distributions with increasing or decreasing hazard rates.


Relationship to Other Distributions

ThDistributional Connections

(i) Sum of Geometrics = Negative Binomial: If X1,,XriidGeometric(p)X_1, \ldots, X_r \overset{\text{iid}}{\sim} \text{Geometric}(p), then i=1rXiNegative Binomial(r,p)\sum_{i=1}^r X_i \sim \text{Negative Binomial}(r, p) (counting trials to rr-th success).

(ii) Geometric ⊂ Geometric: The geometric is a special case of the negative binomial with r=1r = 1.

(iii) Discrete Analogue of Exponential: The geometric is to the discrete case what the exponential distribution is to the continuous case. Both are memoryless and characterized by constant hazard rates.

(iv) Geometric as sum of Bernoulli indicators: X=1+j=1i=1j(1Xi)X = 1 + \sum_{j=1}^{\infty} \prod_{i=1}^j (1 - X_i') where XiBernoulli(p)X_i' \sim \text{Bernoulli}(p).


Worked Example: Quality Control

Example: Defective Items on Assembly Line

A machine produces items with defect probability p=0.05p = 0.05. An inspector checks items sequentially until finding the first defective one.

Expected items inspected: E[X]=1/p=1/0.05=20E[X] = 1/p = 1/0.05 = 20.

Variance: Var(X)=(1p)/p2=0.95/0.0025=380\text{Var}(X) = (1-p)/p^2 = 0.95/0.0025 = 380.

Standard deviation: σ=38019.49\sigma = \sqrt{380} \approx 19.49.

Probability the first defect appears on or before trial 10:

P(X10)=1(0.95)10=10.59870.4013.P(X \leq 10) = 1 - (0.95)^{10} = 1 - 0.5987 \approx 0.4013.

Probability it takes more than 50 trials:

P(X>50)=(0.95)500.0769.P(X > 50) = (0.95)^{50} \approx 0.0769.

Memoryless property in action: If the inspector has already checked 20 items with no defect found, the probability the next item is defective is still p=0.05p = 0.05 — exactly the same as for a fresh start.


Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Simulate geometric random variables
p = 0.3
n = 10000
samples = np.random.geometric(p, size=n)

# Verify mean and variance
print(f"Geometric(p={p})")
print(f"  Empirical mean:     {np.mean(samples):.4f}  (theoretical: {1/p:.4f})")
print(f"  Empirical variance: {np.var(samples, ddof=0):.4f}  (theoretical: {(1-p)/p**2:.4f})")

# Verify memoryless property
for s in [5, 10, 20]:
    given_gt_s = samples[samples > s]
    empirical = np.mean(given_gt_s > s + 5)  # P(X > s+5 | X > s) ≈ P(X > 5)
    theoretical = (1-p)**5
    print(f"  P(X > {s+5} | X > {s}): empirical={empirical:.4f}, theoretical P(X>5)={theoretical:.4f}")

Python Implementation: Hazard Rate Verification

import numpy as np

np.random.seed(42)

# Verify constant hazard rate for geometric distribution
p = 0.25
n = 50000
samples = np.random.geometric(p, size=n)

print(f"Geometric(p={p}) — Hazard Rate Verification")
print(f"{'k':>4} {'P(X=k | X>=k)':>14} {'p (theoretical)':>16}")
print("-" * 36)

for k in [1, 2, 3, 5, 10, 20]:
    given_ge_k = samples[samples >= k]
    if len(given_ge_k) > 0:
        hazard = np.mean(given_ge_k == k)
        print(f"{k:>4} {hazard:>14.4f} {p:>16.4f}")

# Show that geometric inter-arrival times in Bernoulli process are geometric
print(f"\nBernoulli process inter-arrival times:")
bernoulli = np.random.binomial(1, p, size=10000)
successes = np.where(bernoulli == 1)[0]
inter_arrival = np.diff(np.concatenate([[-1], successes]))
print(f"  Mean inter-arrival: {np.mean(inter_arrival):.4f} (theoretical: {1/p:.4f})")

Key Takeaways

Summary: Geometric Distribution

  • Counts trials until first success: P(X=k)=(1p)k1pP(X=k) = (1-p)^{k-1}p for k1k \geq 1
  • Mean: E[X]=1/pE[X] = 1/p; Variance: Var(X)=(1p)/p2\text{Var}(X) = (1-p)/p^2
  • Memoryless property: P(X>s+tX>s)=P(X>t)P(X > s+t \mid X > s) = P(X > t) — the only discrete distribution with this property
  • Constant hazard rate: P(X=kXk)=pP(X = k \mid X \geq k) = p for all kk
  • CDF: P(Xk)=1(1p)kP(X \leq k) = 1 - (1-p)^k
  • Foundation for the Negative Binomial (sum of rr independent geometrics)
  • Discrete analogue of the exponential distribution

Premium Content

Geometric Distribution — Waiting Time for First Success

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement