🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Negative Binomial Distribution — Waiting for r-th Success

Foundations of StatisticsProbability Distributions🟢 Free Lesson

Advertisement

Negative Binomial Distribution

Probability Distributions

Waiting for the r-th Success — Overdispersed Counts

The negative binomial distribution generalizes the geometric: instead of waiting for the first success, we wait for the rr-th success. It is the go-to model for overdispersed count data.

  • Insurance — claims per policyholder (variance > mean)
  • Epidemiology — disease cases per region (heterogeneous rates)
  • Ecology — species counts per quadrat (aggregated populations)
  • Transportation — passengers per bus (bursty arrivals)

When the Poisson's mean-equals-variance assumption fails, the negative binomial saves the day.


Core Concepts

The negative binomial distribution generalizes the geometric distribution: instead of waiting for the first success, we wait for the rr-th success. It arises naturally as a sum of independent geometric random variables and serves as a flexible model for overdispersed count data.

DfNegative Binomial Distribution (Number of Failures)

A random variable XX follows a negative binomial distribution with parameters (r,p)(r, p), written XNB(r,p)X \sim \text{NB}(r, p), if its PMF is:

P(X=k)=(k+r1k)pr(1p)k,k=0,1,2,P(X = k) = \binom{k + r - 1}{k}\, p^r\, (1-p)^k, \qquad k = 0, 1, 2, \ldots

Here XX counts the number of failures before the rr-th success, with success probability pp per trial.

Alternative Parametrization

Some authors use XX to count the total number of trials (successes + failures) needed to achieve rr successes. Under that convention, Y=X+rY = X + r with PMF P(Y=y)=(y1r1)pr(1p)yrP(Y = y) = \binom{y-1}{r-1}p^r(1-p)^{y-r} for y=r,r+1,y = r, r+1, \ldots Be careful to check which convention a textbook uses.


PMF Derivation

Why This PMF Is Correct

To observe exactly kk failures before the rr-th success, two conditions must hold:

  1. The last trial must be a success (the rr-th success). This contributes probability pp.

  2. Among the first k+r1k + r - 1 trials, there must be exactly r1r - 1 successes and kk failures. The number of ways to arrange these is (k+r1k)\binom{k+r-1}{k}, and each arrangement has probability pr1(1p)kp^{r-1}(1-p)^k.

Combining:

P(X=k)=(k+r1k)pr1(1p)kp=(k+r1k)pr(1p)k.P(X = k) = \binom{k+r-1}{k}\, p^{r-1}(1-p)^k \cdot p = \binom{k+r-1}{k}\, p^r(1-p)^k.

Verification via the negative binomial series:

k=0(k+r1k)pr(1p)k=prk=0(k+r1k)(1p)k=pr1(1(1p))r=pr1pr=1.\sum_{k=0}^{\infty} \binom{k+r-1}{k} p^r(1-p)^k = p^r \sum_{k=0}^{\infty} \binom{k+r-1}{k} (1-p)^k = p^r \cdot \frac{1}{(1-(1-p))^r} = p^r \cdot \frac{1}{p^r} = 1.

This uses the negative binomial expansion (1x)r=k=0(k+r1k)xk(1-x)^{-r} = \sum_{k=0}^{\infty}\binom{k+r-1}{k}x^k.


Mean and Variance

Negative Binomial Mean and Variance

E[X]=r(1p)p,Var(X)=r(1p)p2E[X] = \frac{r(1-p)}{p}, \quad \text{Var}(X) = \frac{r(1-p)}{p^2}

Here,

  • rr=Target number of successes
  • pp=Success probability
  • 1p1-p=Failure probability

Derivation

XX is the sum of rr independent geometric(p)(p) random variables Y1,,YrY_1, \ldots, Y_r, each counting failures before one success. By linearity:

E[X]=rE[Y1]=r1pp.E[X] = r \cdot E[Y_1] = r \cdot \frac{1-p}{p}.

By independence:

Var(X)=rVar(Y1)=r1pp2.\text{Var}(X) = r \cdot \text{Var}(Y_1) = r \cdot \frac{1-p}{p^2}.

Variance-to-Mean Ratio

The mean-to-variance relationship is:

Var(X)=E[X]p=E[X]+E[X]2r.\text{Var}(X) = \frac{E[X]}{p} = E[X] + \frac{E[X]^2}{r}.

This shows Var(X)>E[X]\text{Var}(X) > E[X] (overdispersion) for all finite rr. As rr \to \infty with E[X]E[X] fixed, Var(X)E[X]\text{Var}(X) \to E[X], recovering the Poisson limit.


Cumulant Generating Function

Moment Generating Function

MX(t)=(p1(1p)et)r,t<log(1p)M_X(t) = \left(\frac{p}{1 - (1-p)e^t}\right)^r, \quad t < -\log(1-p)

Here,

  • rr=Number of successes
  • pp=Success probability

Derivation

Since X=Y1++YrX = Y_1 + \cdots + Y_r with YiGeometric(p)Y_i \sim \text{Geometric}(p) iid, and MYi(t)=p1(1p)etM_{Y_i}(t) = \frac{p}{1-(1-p)e^t}:

MX(t)=(MY1(t))r=(p1(1p)et)r.M_X(t) = \left(M_{Y_1}(t)\right)^r = \left(\frac{p}{1-(1-p)e^t}\right)^r.

The cumulant generating function is:

KX(t)=rlogprlog ⁣(1(1p)et),K_X(t) = r\log p - r\log\!\left(1-(1-p)e^t\right),

yielding cumulants:

κ1=r(1p)p,κ2=r(1p)p2,κ3=r(1p)(2p)p3.\kappa_1 = \frac{r(1-p)}{p}, \qquad \kappa_2 = \frac{r(1-p)}{p^2}, \qquad \kappa_3 = \frac{r(1-p)(2-p)}{p^3}.

The Negative Binomial as a Poisson-Gamma Mixture

ThGamma-Poisson Mixture Representation

If XλPoisson(λ)X \mid \lambda \sim \text{Poisson}(\lambda) and λGamma(r,p1p)\lambda \sim \text{Gamma}(r, \frac{p}{1-p}) (shape rr, rate p/(1p)p/(1-p)), then the marginal distribution of XX is Negative Binomial(r,p)\text{Negative Binomial}(r, p).

Proof

P(X=k)=0λkeλk!(β)rλr1eβλΓ(r)dλ=βrk!Γ(r)0λk+r1e(1+β)λdλP(X = k) = \int_0^{\infty} \frac{\lambda^k e^{-\lambda}}{k!} \cdot \frac{(\beta)^r \lambda^{r-1} e^{-\beta\lambda}}{\Gamma(r)}\,d\lambda = \frac{\beta^r}{k!\,\Gamma(r)} \int_0^{\infty} \lambda^{k+r-1} e^{-(1+\beta)\lambda}\,d\lambda
=βrk!Γ(r)Γ(k+r)(1+β)k+r=(k+r1k)βr(1+β)k+r=(k+r1k)(β1+β)r(11+β)k.= \frac{\beta^r}{k!\,\Gamma(r)} \cdot \frac{\Gamma(k+r)}{(1+\beta)^{k+r}} = \binom{k+r-1}{k} \frac{\beta^r}{(1+\beta)^{k+r}} = \binom{k+r-1}{k} \left(\frac{\beta}{1+\beta}\right)^r \left(\frac{1}{1+\beta}\right)^k.

Setting p=β1+βp = \frac{\beta}{1+\beta} (so β=1pp\beta = \frac{1-p}{p}) gives the NB(r,p)\text{NB}(r,p) PMF.

Why This Matters

This representation explains overdispersion in count data. If counts are Poisson but the rate λ\lambda varies across observations (heterogeneity), the marginal distribution becomes negative binomial. The parameter rr controls the degree of heterogeneity: smaller rr means more variation in λ\lambda, hence more overdispersion.


Overdispersion in Practice

Poisson vs Negative Binomial

For Poisson data, Var(X)=E[X]\text{Var}(X) = E[X]. When the empirical variance substantially exceeds the mean, the Poisson model is inadequate and the negative binomial provides a better fit.

Rule of thumb: If Var(X)/E[X]>1.5\text{Var}(X) / E[X] > 1.5, consider the negative binomial.

Estimation: Given data x1,,xnx_1, \ldots, x_n, the method of moments estimators are:

p^=xˉs2,r^=xˉp^1p^=xˉ2s2xˉ,\hat{p} = \frac{\bar{x}}{s^2}, \qquad \hat{r} = \frac{\bar{x}\,\hat{p}}{1 - \hat{p}} = \frac{\bar{x}^2}{s^2 - \bar{x}},

where xˉ\bar{x} is the sample mean and s2s^2 is the sample variance.


Worked Example: Call Center Modeling

Example: Customer Calls Per Hour

A call center receives an average of μ=8\mu = 8 calls per hour, but the variance is s2=18s^2 = 18 (overdispersed). We model calls XNB(r,p)X \sim \text{NB}(r, p).

Method of moments:

p^=xˉs2=8180.444,r^=xˉ2s2xˉ=6410=6.4.\hat{p} = \frac{\bar{x}}{s^2} = \frac{8}{18} \approx 0.444, \qquad \hat{r} = \frac{\bar{x}^2}{s^2 - \bar{x}} = \frac{64}{10} = 6.4.

Verification: E[X]=r(1p)/p=6.4×0.556/0.4448.0E[X] = r(1-p)/p = 6.4 \times 0.556/0.444 \approx 8.0

Var(X)=r(1p)/p2=6.4×0.556/0.19718.0\text{Var}(X) = r(1-p)/p^2 = 6.4 \times 0.556/0.197 \approx 18.0

Compare with Poisson: A Poisson(8) model would predict Var=8\text{Var} = 8, severely underestimating the true variability of 18. This has direct consequences for staffing: the negative binomial predicts more extreme fluctuations (busy and idle periods).


Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Negative binomial parameters
r, p = 6, 0.4
n = 10000

# Simulate
samples = np.random.negative_binomial(r, p, size=n)

# Verify mean and variance
mean_theory = r * (1 - p) / p
var_theory = r * (1 - p) / p**2
print(f"NB(r={r}, p={p}):")
print(f"  Theoretical mean: {mean_theory:.4f}, variance: {var_theory:.4f}")
print(f"  Empirical mean:   {np.mean(samples):.4f}, variance: {np.var(samples, ddof=0):.4f}")

# Show relationship to geometric sum
geom_samples = np.random.geometric(p, size=(n, r))
geom_sum = geom_samples.sum(axis=1) - r  # convert from "trials" to "failures"
print(f"\n  Sum of {r} Geometric(p={p}): mean={np.mean(geom_sum):.4f}, var={np.var(geom_sum, ddof=0):.4f}")
print(f"  (Should match NB values above)")

# Show Poisson-Gamma mixture
lam = np.random.gamma(shape=r, scale=(1-p)/p, size=n)
poisson_samples = np.random.poisson(lam)
print(f"\n  Poisson-Gamma mixture: mean={np.mean(poisson_samples):.4f}, var={np.var(poisson_samples, ddof=0):.4f}")

Python Implementation: Overdispersion Detection

import numpy as np
from scipy import stats

np.random.seed(42)

# Generate overdispersed count data (NB instead of Poisson)
true_mu = 5
true_alpha = 2.0  # dispersion parameter: r = 1/alpha
r = 1 / true_alpha
p = r / (r + true_mu)
n = 500

data = np.random.negative_binomial(r, p, size=n)

sample_mean = np.mean(data)
sample_var = np.var(data, ddof=1)
dispersion_ratio = sample_var / sample_mean

print(f"Overdispersion Test")
print(f"  Sample mean:  {sample_mean:.4f}")
print(f"  Sample var:   {sample_var:.4f}")
print(f"  Var/Mean:     {dispersion_ratio:.4f}")
print(f"  (Var/Mean > 1 suggests overdispersion; Poisson requires Var/Mean ≈ 1)")

# Method of moments estimates for NB
p_hat = sample_mean / sample_var
r_hat = sample_mean * p_hat / (1 - p_hat)
print(f"\n  Method of moments estimates:")
print(f"    p̂ = {p_hat:.4f}, r̂ = {r_hat:.4f}")
print(f"    Estimated mean: {r_hat*(1-p_hat)/p_hat:.4f}")
print(f"    Estimated var:  {r_hat*(1-p_hat)/p_hat**2:.4f}")

Key Takeaways

Summary: Negative Binomial Distribution

  • Counts failures before the rr-th success: P(X=k)=(k+r1k)pr(1p)kP(X=k) = \binom{k+r-1}{k}p^r(1-p)^k
  • Mean: r(1p)/pr(1-p)/p; Variance: r(1p)/p2r(1-p)/p^2
  • Sum of rr iid geometrics: X=Y1++YrX = Y_1 + \cdots + Y_r with YiGeometric(p)Y_i \sim \text{Geometric}(p)
  • When r=1r = 1, reduces to the geometric distribution
  • Poisson-Gamma mixture: XNB(r,p)X \sim \text{NB}(r, p) arises from Poisson with Gamma-distributed rate
  • Models overdispersed count data where variance exceeds mean: Var(X)>E[X]\text{Var}(X) > E[X]
  • As rr \to \infty with E[X]E[X] fixed, converges to Poisson (law of large numbers for Gamma)
  • Variance-to-mean ratio: Var/E[X]=1/p=1+E[X]/r\text{Var}/E[X] = 1/p = 1 + E[X]/r, controlled by rr

Premium Content

Negative Binomial Distribution — Waiting for r-th Success

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement