🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Binomial Distribution — Count of Successes in n Trials

Foundations of StatisticsProbability Distributions🟢 Free Lesson

Advertisement

Binomial Distribution

Probability Distributions

Counting Successes in Fixed Trials

The binomial distribution answers a simple but profound question: if I repeat the same experiment nn times, how many times will I succeed?

  • Quality control — number of defective items in a batch of 100
  • Medical trials — patients responding to treatment out of 50 enrolled
  • Finance — days the market goes up out of 252 trading days
  • A/B testing — conversions out of 1,000 visitors

The binomial is the bridge between individual probability and aggregate statistics.


The Bernoulli Trial

Definition

A Bernoulli trial is a random experiment with exactly two outcomes: success (with probability pp) and failure (with probability 1p1-p). A random variable XX representing a single trial has:

X={1with probability p0with probability 1pX = \begin{cases} 1 & \text{with probability } p \\ 0 & \text{with probability } 1-p \end{cases}

This is the Bernoulli distribution: XBernoulli(p)X \sim \text{Bernoulli}(p).

Bernoulli PMF and Moments

P(X=x)=px(1p)1x,x{0,1}P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\}

Here,

  • pp=Probability of success
  • 1p1-p=Probability of failure
  • E[X]E[X]== p (mean)
  • Var(X)Var(X)== p(1-p) (variance)

From Bernoulli to Binomial

Definition

Let X1,X2,,XnX_1, X_2, \ldots, X_n be i.i.d. Bernoulli(pp) random variables. The binomial random variable counts the total number of successes:

X=i=1nXi=number of successes in n trialsX = \sum_{i=1}^n X_i = \text{number of successes in } n \text{ trials}

We write XBin(n,p)X \sim \text{Bin}(n, p).

ThBinomial Probability Mass Function

P(X=k)=(nk)pk(1p)nk,k=0,1,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n

where (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient (read: "$n$ choose $k$").

Why the Binomial Coefficient Appears

The binomial coefficient (nk)\binom{n}{k} counts the number of distinct ways to arrange kk successes among nn trials. Each arrangement has probability pk(1p)nkp^k(1-p)^{n-k} (by independence). Summing over all (nk)\binom{n}{k} arrangements gives the total probability.


Derivation of Mean and Variance

ThMean and Variance of Binomial

For XBin(n,p)X \sim \text{Bin}(n, p):

E[X]=npE[X] = np
Var(X)=np(1p)\text{Var}(X) = np(1-p)

Proof. Since X=i=1nXiX = \sum_{i=1}^n X_i with XiBernoulli(p)X_i \sim \text{Bernoulli}(p) i.i.d.:

E[X]=i=1nE[Xi]=npE[X] = \sum_{i=1}^n E[X_i] = np
Var(X)=i=1nVar(Xi)=np(1p)\text{Var}(X) = \sum_{i=1}^n \text{Var}(X_i) = np(1-p)

using the additivity of variance for independent random variables. \square

The variance np(1p)np(1-p) is maximized when p=0.5p = 0.5, giving Var(X)=n/4\text{Var}(X) = n/4.


Generating Function

Probability Generating Function

GX(s)=E[sX]=(1p+ps)nG_X(s) = E[s^X] = (1-p+ps)^n

Here,

  • GX(s)G_X(s)=Probability generating function
  • ss=Complex variable (|s| ≤ 1)

The MGF is MX(t)=(1p+pet)nM_X(t) = (1-p+pe^t)^n, from which all moments can be derived.


Normal Approximation to the Binomial

ThNormal Approximation to Binomial

When nn is large and pp is not too close to 0 or 1:

XBin(n,p)    YN(np,np(1p))X \sim \text{Bin}(n, p) \;\approx\; Y \sim \mathcal{N}(np, \, np(1-p))

The approximation improves as npnp \to \infty and n(1p)n(1-p) \to \infty. A continuity correction (replacing P(Xk)P(X \leq k) with P(Yk+0.5)P(Y \leq k + 0.5)) significantly improves accuracy.

Rule of thumb: The normal approximation is adequate when np10np \geq 10 and n(1p)10n(1-p) \geq 10.

Continuity correction:

P(Xk)P(Zk+0.5npnp(1p))P(X \leq k) \approx P\left(Z \leq \frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right)

Poisson Approximation

When nn is large and pp is small (with np=λnp = \lambda fixed):

Poisson Approximation to Binomial

Bin(n,p)    Poisson(λ),λ=np\text{Bin}(n, p) \;\approx\; \text{Poisson}(\lambda), \quad \lambda = np

Here,

  • λ\lambda=Expected number of successes (np)
  • nn=Number of trials (large)
  • pp=Success probability (small)

The Poisson approximation is appropriate when n20n \geq 20 and p0.05p \leq 0.05, or n100n \geq 100 and np10np \leq 10.


Worked Example: Quality Control

Example: Defective Items in a Batch

A quality control inspector checks n=200n = 200 items, each with defect probability p=0.03p = 0.03. What is P(X10)P(X \leq 10)?

Exact (binomial): P(X10)=k=010(200k)(0.03)k(0.97)200kP(X \leq 10) = \sum_{k=0}^{10} \binom{200}{k}(0.03)^k(0.97)^{200-k}

Normal approximation: μ=200(0.03)=6\mu = 200(0.03) = 6, σ2=200(0.03)(0.97)=5.82\sigma^2 = 200(0.03)(0.97) = 5.82

P(X10)P(Z10.565.82)=P(Z1.86)0.9686P(X \leq 10) \approx P\left(Z \leq \frac{10.5 - 6}{\sqrt{5.82}}\right) = P(Z \leq 1.86) \approx 0.9686

Poisson approximation: λ=6\lambda = 6

P(X10)=k=010e66kk!0.9828P(X \leq 10) = \sum_{k=0}^{10} \frac{e^{-6} 6^k}{k!} \approx 0.9828

The Poisson approximation is more accurate here because pp is small.


Worked Example: Coin Flipping

Example: Fair and Biased Coins

You flip a fair coin (p=0.5p = 0.5) 10 times. What is the probability of getting exactly 7 heads?

P(X=7)=(107)(0.5)7(0.5)3=10!7!3!(0.5)10=120×11024=0.1172P(X = 7) = \binom{10}{7}(0.5)^7(0.5)^3 = \frac{10!}{7!3!}(0.5)^{10} = 120 \times \frac{1}{1024} = 0.1172

Mean: E[X]=10(0.5)=5E[X] = 10(0.5) = 5 heads Variance: Var(X)=10(0.5)(0.5)=2.5\text{Var}(X) = 10(0.5)(0.5) = 2.5 Standard deviation: σ=2.5=1.58\sigma = \sqrt{2.5} = 1.58

So getting 7 heads is (75)/1.58=1.27(7 - 5)/1.58 = 1.27 standard deviations above the mean — unusual but not extreme.


Worked Example: A/B Testing

Example: Website Conversion Rate

An A/B test compares two versions of a landing page:

  • Version A: 1,000 visitors, 120 conversions → p^A=0.120\hat{p}_A = 0.120
  • Version B: 1,000 visitors, 156 conversions → p^B=0.156\hat{p}_B = 0.156

Under the binomial model:

  • XABin(1000,0.12)X_A \sim \text{Bin}(1000, 0.12): μA=120\mu_A = 120, σA=1000×0.12×0.88=10.28\sigma_A = \sqrt{1000 \times 0.12 \times 0.88} = 10.28
  • XBBin(1000,0.156)X_B \sim \text{Bin}(1000, 0.156): μB=156\mu_B = 156, σB=1000×0.156×0.844=11.48\sigma_B = \sqrt{1000 \times 0.156 \times 0.844} = 11.48

Test statistic: z=p^Bp^Ap^(1p^)(1/nA+1/nB)z = \frac{\hat{p}_B - \hat{p}_A}{\sqrt{\hat{p}(1-\hat{p})(1/n_A + 1/n_B)}} where p^=0.138\hat{p} = 0.138

z=0.1560.1200.138×0.862×0.002=0.0360.0154=2.34z = \frac{0.156 - 0.120}{\sqrt{0.138 \times 0.862 \times 0.002}} = \frac{0.036}{0.0154} = 2.34

pp-value 0.019\approx 0.019 — Version B has significantly higher conversion at α=0.05\alpha = 0.05.


Relationship to Other Distributions

DistributionRelationship to Binomial
BernoulliSpecial case: n=1n = 1
PoissonLimit as nn \to \infty, p0p \to 0, np=λnp = \lambda
NormalLimit as nn \to \infty, npnp \to \infty, n(1p)n(1-p) \to \infty
Negative BinomialCounts trials until rr successes (different parameterization)
HypergeometricSampling without replacement (vs. with replacement for binomial)

Python Implementation

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

np.random.seed(42)

# Binomial distribution: n=20, p=0.4
n, p = 20, 0.4
x = np.arange(0, n+1)
pmf = stats.binom.pmf(x, n, p)

print(f"Binomial(n={n}, p={p})")
print(f"Mean:     {stats.binom.mean(n, p):.1f}")
print(f"Variance: {stats.binom.var(n, p):.2f}")
print(f"Std Dev:  {stats.binom.std(n, p):.2f}")
print(f"P(X=8):   {stats.binom.pmf(8, n, p):.4f}")
print(f"P(X<=8):  {stats.binom.cdf(8, n, p):.4f}")

# Normal approximation comparison
mu, sigma = n*p, np.sqrt(n*p*(1-p))
x_norm = np.linspace(0, n, 100)
pdf_norm = stats.norm.pdf(x_norm, mu, sigma)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PMF vs normal approximation
axes[0].bar(x, pmf, alpha=0.7, label=f'Binomial({n},{p})')
axes[0].plot(x_norm, pdf_norm, 'r-', lw=2, label=f'Normal({mu:.1f},{sigma:.2f})')
axes[0].set_xlabel('k')
axes[0].set_ylabel('P(X=k)')
axes[0].set_title('Binomial vs Normal Approximation')
axes[0].legend()

# Effect of p
for p_val, color in [(0.2, '#ef4444'), (0.5, '#6366f1'), (0.8, '#22c55e')]:
    pmf_p = stats.binom.pmf(x, n, p_val)
    axes[1].bar(x, pmf_p, alpha=0.6, label=f'p={p_val}', color=color)
axes[1].set_xlabel('k')
axes[1].set_ylabel('P(X=k)')
axes[1].set_title('Effect of Success Probability')
axes[1].legend()

plt.tight_layout()
plt.savefig('binomial_distribution.png', dpi=150)
plt.show()

Key Takeaways

Counts successes in nn independent Bernoulli trials: XBin(n,p)X \sim \text{Bin}(n, p)

PMF: P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k(1-p)^{n-k}, k=0,1,,nk = 0, 1, \ldots, n

Mean: E[X]=npE[X] = np; Variance: Var(X)=np(1p)\text{Var}(X) = np(1-p)

Normal approximation valid when np10np \geq 10 and n(1p)10n(1-p) \geq 10 (with continuity correction)

Poisson approximation valid when nn is large and pp is small

The binomial is the foundation of hypothesis testing for proportions and confidence intervals

"The binomial distribution is the discrete analog of the normal distribution — and like the normal, it appears everywhere."

Premium Content

Binomial Distribution — Count of Successes in n Trials

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement