Negative Binomial Distribution

Probability Distributions

Waiting for the r-th Success — Overdispersed Counts

The negative binomial distribution generalizes the geometric: instead of waiting for the first success, we wait for the $r$ -th success. It is the go-to model for overdispersed count data.

Insurance — claims per policyholder (variance > mean)
Epidemiology — disease cases per region (heterogeneous rates)
Ecology — species counts per quadrat (aggregated populations)
Transportation — passengers per bus (bursty arrivals)

When the Poisson's mean-equals-variance assumption fails, the negative binomial saves the day.

Core Concepts

The negative binomial distribution generalizes the geometric distribution: instead of waiting for the first success, we wait for the $r$ -th success. It arises naturally as a sum of independent geometric random variables and serves as a flexible model for overdispersed count data.

DfNegative Binomial Distribution (Number of Failures)

A random variable $X$ follows a negative binomial distribution with parameters $(r, p)$ , written $X \sim \text{NB}(r, p)$ , if its PMF is:

P(X = k) = \binom{k + r - 1}{k}\, p^r\, (1-p)^k, \qquad k = 0, 1, 2, \ldots

Here $X$ counts the number of failures before the $r$ -th success, with success probability $p$ per trial.

Alternative Parametrization

Some authors use $X$ to count the total number of trials (successes + failures) needed to achieve $r$ successes. Under that convention, $Y = X + r$ with PMF $P(Y = y) = \binom{y-1}{r-1}p^r(1-p)^{y-r}$ for $y = r, r+1, \ldots$ Be careful to check which convention a textbook uses.

PMF Derivation

Why This PMF Is Correct

To observe exactly $k$ failures before the $r$ -th success, two conditions must hold:

The last trial must be a success (the $r$ -th success). This contributes probability $p$ .
Among the first $k + r - 1$ trials, there must be exactly $r - 1$ successes and $k$ failures. The number of ways to arrange these is $\binom{k+r-1}{k}$ , and each arrangement has probability $p^{r-1}(1-p)^k$ .

Combining:

P(X = k) = \binom{k+r-1}{k}\, p^{r-1}(1-p)^k \cdot p = \binom{k+r-1}{k}\, p^r(1-p)^k.

Verification via the negative binomial series:

\sum_{k=0}^{\infty} \binom{k+r-1}{k} p^r(1-p)^k = p^r \sum_{k=0}^{\infty} \binom{k+r-1}{k} (1-p)^k = p^r \cdot \frac{1}{(1-(1-p))^r} = p^r \cdot \frac{1}{p^r} = 1.

This uses the negative binomial expansion $(1-x)^{-r} = \sum_{k=0}^{\infty}\binom{k+r-1}{k}x^k$ .

Mean and Variance

Negative Binomial Mean and Variance

E[X] = \frac{r(1-p)}{p}, \quad \text{Var}(X) = \frac{r(1-p)}{p^2}

Here,

$r$ =Target number of successes
$p$ =Success probability
$1-p$ =Failure probability

Derivation

$X$ is the sum of $r$ independent geometric $(p)$ random variables $Y_1, \ldots, Y_r$ , each counting failures before one success. By linearity:

E[X] = r \cdot E[Y_1] = r \cdot \frac{1-p}{p}.

By independence:

\text{Var}(X) = r \cdot \text{Var}(Y_1) = r \cdot \frac{1-p}{p^2}.

Variance-to-Mean Ratio

The mean-to-variance relationship is:

\text{Var}(X) = \frac{E[X]}{p} = E[X] + \frac{E[X]^2}{r}.

This shows $\text{Var}(X) > E[X]$ (overdispersion) for all finite $r$ . As $r \to \infty$ with $E[X]$ fixed, $\text{Var}(X) \to E[X]$ , recovering the Poisson limit.

Cumulant Generating Function

Moment Generating Function

M_X(t) = \left(\frac{p}{1 - (1-p)e^t}\right)^r, \quad t < -\log(1-p)

Here,

$r$ =Number of successes
$p$ =Success probability

Derivation

Since $X = Y_1 + \cdots + Y_r$ with $Y_i \sim \text{Geometric}(p)$ iid, and $M_{Y_i}(t) = \frac{p}{1-(1-p)e^t}$ :

M_X(t) = \left(M_{Y_1}(t)\right)^r = \left(\frac{p}{1-(1-p)e^t}\right)^r.

The cumulant generating function is:

K_X(t) = r\log p - r\log\!\left(1-(1-p)e^t\right),

yielding cumulants:

\kappa_1 = \frac{r(1-p)}{p}, \qquad \kappa_2 = \frac{r(1-p)}{p^2}, \qquad \kappa_3 = \frac{r(1-p)(2-p)}{p^3}.

The Negative Binomial as a Poisson-Gamma Mixture

ThGamma-Poisson Mixture Representation

If $X \mid \lambda \sim \text{Poisson}(\lambda)$ and $\lambda \sim \text{Gamma}(r, \frac{p}{1-p})$ (shape $r$ , rate $p/(1-p)$ ), then the marginal distribution of $X$ is $\text{Negative Binomial}(r, p)$ .

Proof

P(X = k) = \int_0^{\infty} \frac{\lambda^k e^{-\lambda}}{k!} \cdot \frac{(\beta)^r \lambda^{r-1} e^{-\beta\lambda}}{\Gamma(r)}\,d\lambda = \frac{\beta^r}{k!\,\Gamma(r)} \int_0^{\infty} \lambda^{k+r-1} e^{-(1+\beta)\lambda}\,d\lambda

= \frac{\beta^r}{k!\,\Gamma(r)} \cdot \frac{\Gamma(k+r)}{(1+\beta)^{k+r}} = \binom{k+r-1}{k} \frac{\beta^r}{(1+\beta)^{k+r}} = \binom{k+r-1}{k} \left(\frac{\beta}{1+\beta}\right)^r \left(\frac{1}{1+\beta}\right)^k.

Setting $p = \frac{\beta}{1+\beta}$ (so $\beta = \frac{1-p}{p}$ ) gives the $\text{NB}(r,p)$ PMF.

Why This Matters

This representation explains overdispersion in count data. If counts are Poisson but the rate $\lambda$ varies across observations (heterogeneity), the marginal distribution becomes negative binomial. The parameter $r$ controls the degree of heterogeneity: smaller $r$ means more variation in $\lambda$ , hence more overdispersion.

Overdispersion in Practice

Poisson vs Negative Binomial

For Poisson data, $\text{Var}(X) = E[X]$ . When the empirical variance substantially exceeds the mean, the Poisson model is inadequate and the negative binomial provides a better fit.

Rule of thumb: If $\text{Var}(X) / E[X] > 1.5$ , consider the negative binomial.

Estimation: Given data $x_1, \ldots, x_n$ , the method of moments estimators are:

\hat{p} = \frac{\bar{x}}{s^2}, \qquad \hat{r} = \frac{\bar{x}\,\hat{p}}{1 - \hat{p}} = \frac{\bar{x}^2}{s^2 - \bar{x}},

where $\bar{x}$ is the sample mean and $s^2$ is the sample variance.

Worked Example: Call Center Modeling

Example: Customer Calls Per Hour

A call center receives an average of $\mu = 8$ calls per hour, but the variance is $s^2 = 18$ (overdispersed). We model calls $X \sim \text{NB}(r, p)$ .

Method of moments:

\hat{p} = \frac{\bar{x}}{s^2} = \frac{8}{18} \approx 0.444, \qquad \hat{r} = \frac{\bar{x}^2}{s^2 - \bar{x}} = \frac{64}{10} = 6.4.

Verification: $E[X] = r(1-p)/p = 6.4 \times 0.556/0.444 \approx 8.0$ ✓

$\text{Var}(X) = r(1-p)/p^2 = 6.4 \times 0.556/0.197 \approx 18.0$ ✓

Compare with Poisson: A Poisson(8) model would predict $\text{Var} = 8$ , severely underestimating the true variability of 18. This has direct consequences for staffing: the negative binomial predicts more extreme fluctuations (busy and idle periods).

Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Negative binomial parameters
r, p = 6, 0.4
n = 10000

# Simulate
samples = np.random.negative_binomial(r, p, size=n)

# Verify mean and variance
mean_theory = r * (1 - p) / p
var_theory = r * (1 - p) / p**2
print(f"NB(r={r}, p={p}):")
print(f"  Theoretical mean: {mean_theory:.4f}, variance: {var_theory:.4f}")
print(f"  Empirical mean:   {np.mean(samples):.4f}, variance: {np.var(samples, ddof=0):.4f}")

# Show relationship to geometric sum
geom_samples = np.random.geometric(p, size=(n, r))
geom_sum = geom_samples.sum(axis=1) - r  # convert from "trials" to "failures"
print(f"\n  Sum of {r} Geometric(p={p}): mean={np.mean(geom_sum):.4f}, var={np.var(geom_sum, ddof=0):.4f}")
print(f"  (Should match NB values above)")

# Show Poisson-Gamma mixture
lam = np.random.gamma(shape=r, scale=(1-p)/p, size=n)
poisson_samples = np.random.poisson(lam)
print(f"\n  Poisson-Gamma mixture: mean={np.mean(poisson_samples):.4f}, var={np.var(poisson_samples, ddof=0):.4f}")

Python Implementation: Overdispersion Detection

import numpy as np
from scipy import stats

np.random.seed(42)

# Generate overdispersed count data (NB instead of Poisson)
true_mu = 5
true_alpha = 2.0  # dispersion parameter: r = 1/alpha
r = 1 / true_alpha
p = r / (r + true_mu)
n = 500

data = np.random.negative_binomial(r, p, size=n)

sample_mean = np.mean(data)
sample_var = np.var(data, ddof=1)
dispersion_ratio = sample_var / sample_mean

print(f"Overdispersion Test")
print(f"  Sample mean:  {sample_mean:.4f}")
print(f"  Sample var:   {sample_var:.4f}")
print(f"  Var/Mean:     {dispersion_ratio:.4f}")
print(f"  (Var/Mean > 1 suggests overdispersion; Poisson requires Var/Mean ≈ 1)")

# Method of moments estimates for NB
p_hat = sample_mean / sample_var
r_hat = sample_mean * p_hat / (1 - p_hat)
print(f"\n  Method of moments estimates:")
print(f"    p̂ = {p_hat:.4f}, r̂ = {r_hat:.4f}")
print(f"    Estimated mean: {r_hat*(1-p_hat)/p_hat:.4f}")
print(f"    Estimated var:  {r_hat*(1-p_hat)/p_hat**2:.4f}")

Key Takeaways

Summary: Negative Binomial Distribution

Counts failures before the $r$ -th success: $P(X=k) = \binom{k+r-1}{k}p^r(1-p)^k$
Mean: $r(1-p)/p$ ; Variance: $r(1-p)/p^2$
Sum of $r$ iid geometrics: $X = Y_1 + \cdots + Y_r$ with $Y_i \sim \text{Geometric}(p)$
When $r = 1$ , reduces to the geometric distribution
Poisson-Gamma mixture: $X \sim \text{NB}(r, p)$ arises from Poisson with Gamma-distributed rate
Models overdispersed count data where variance exceeds mean: $\text{Var}(X) > E[X]$
As $r \to \infty$ with $E[X]$ fixed, converges to Poisson (law of large numbers for Gamma)
Variance-to-mean ratio: $\text{Var}/E[X] = 1/p = 1 + E[X]/r$ , controlled by $r$

Negative Binomial Distribution — Waiting for r-th Success