🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Beta Distribution — Modeling Probabilities and Proportions

Foundations of StatisticsProbability Distributions🟢 Free Lesson

Advertisement

Beta Distribution — Modeling Probabilities and Proportions

Foundations of Statistics

The Bayesian Workhorse for Probabilities

The beta distribution is the conjugate prior for binomial data, making it essential for Bayesian inference about proportions. Its flexibility on [0,1] makes it perfect for modeling uncertain probabilities and updating beliefs with data.

  • A/B Testing — Updating conversion rate estimates as website test data accumulates
  • Political Polling — Incorporating prior knowledge into election probability forecasts
  • Quality Control — Modeling defect rates in manufacturing with Bayesian methods

The beta distribution turns prior knowledge into posterior certainty.


Core Concepts

The beta distribution is defined on [0,1][0, 1] and is the conjugate prior for the Bernoulli/Binomial likelihood in Bayesian inference. It provides a flexible family for modeling probabilities, proportions, and rates.

DfBeta Distribution

A continuous random variable XX has a beta distribution with shape parameters α>0\alpha > 0 and β>0\beta > 0 if its pdf is:

f(x)=xα1(1x)β1B(α,β),0x1f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Written XBeta(α,β)X \sim \text{Beta}(\alpha, \beta). It naturally models probabilities and proportions.


Interactive Visualization

Beta Distribution — Interactive Explorer
00.10.30.40.60.70.91x00.571.131.702.262.83f(x)μ = 0.29Mo = 0.20Beta(a=2, b=5)
Mean (μ) = 0.2857
Shape Flexibility of the Beta Distribution
00.10.30.40.60.70.91x01.272.543.805.076.34f(x)μ = 0.29Mo = 0.20Beta(a=2, b=5)
Mean (μ) = 0.2857

The Beta Function

DfThe Beta Function

The beta function is defined for α,β>0\alpha, \beta > 0 by:

B(α,β)=01tα1(1t)β1dt=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \int_0^1 t^{\alpha-1}(1-t)^{\beta-1} \, dt = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}

The second equality (connecting the beta and gamma functions) can be proved via the convolution of gamma distributions or through the Dirichlet integral.

ThProof that B(α, β) = Γ(α)Γ(β)/Γ(α+β)

Consider XGamma(α,1)X \sim \text{Gamma}(\alpha, 1) and YGamma(β,1)Y \sim \text{Gamma}(\beta, 1) independent. Let U=X/(X+Y)U = X/(X+Y) and V=X+YV = X+Y.

The joint pdf of (U,V)(U, V) is derived via the Jacobian of the transformation (x,y)=(uv,v(1u))(x,y) = (uv, v(1-u)):

fU,V(u,v)=fX(uv)fY(v(1u))J=(uv)α1euvΓ(α)(v(1u))β1ev(1u)Γ(β)vf_{U,V}(u,v) = f_X(uv) f_Y(v(1-u)) \cdot |J| = \frac{(uv)^{\alpha-1} e^{-uv}}{\Gamma(\alpha)} \cdot \frac{(v(1-u))^{\beta-1} e^{-v(1-u)}}{\Gamma(\beta)} \cdot v
=uα1(1u)β1vα+β1evΓ(α)Γ(β)= \frac{u^{\alpha-1}(1-u)^{\beta-1} v^{\alpha+\beta-1} e^{-v}}{\Gamma(\alpha)\Gamma(\beta)}

Integrating out vv to get fU(u)f_U(u):

fU(u)=uα1(1u)β1Γ(α)Γ(β)0vα+β1evdv=Γ(α+β)Γ(α)Γ(β)uα1(1u)β1f_U(u) = \frac{u^{\alpha-1}(1-u)^{\beta-1}}{\Gamma(\alpha)\Gamma(\beta)} \int_0^\infty v^{\alpha+\beta-1} e^{-v} dv = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} u^{\alpha-1}(1-u)^{\beta-1}

Thus UBeta(α,β)U \sim \text{Beta}(\alpha, \beta) and fUf_U integrates to 1, confirming B(α,β)=Γ(α)Γ(β)/Γ(α+β)B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta).

PDF of Beta Distribution

f(x)=xα1(1x)β1B(α,β),0x1f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Here,

  • α\alpha=First shape parameter (successes + 1)
  • β\beta=Second shape parameter (failures + 1)
  • B(α,β)B(\alpha, \beta)=Beta function (normalizing constant)

Mean, Variance, and Higher Moments

ThDerivation of E[X] and Var(X)

Using the identity B(α+1,β)=αα+βB(α,β)B(\alpha+1, \beta) = \frac{\alpha}{\alpha+\beta} B(\alpha, \beta):

Mean:

E[X]=01xxα1(1x)β1B(α,β)dx=B(α+1,β)B(α,β)=αα+βE[X] = \int_0^1 x \cdot \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)} dx = \frac{B(\alpha+1, \beta)}{B(\alpha, \beta)} = \frac{\alpha}{\alpha + \beta}

Second moment:

E[X2]=B(α+2,β)B(α,β)=α(α+1)(α+β)(α+β+1)E[X^2] = \frac{B(\alpha+2, \beta)}{B(\alpha, \beta)} = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)}

Variance:

Var(X)=α(α+1)(α+β)(α+β+1)α2(α+β)2=αβ(α+β)2(α+β+1)\text{Var}(X) = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)} - \frac{\alpha^2}{(\alpha+\beta)^2} = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}

Beta Mean and Variance

E[X]=αα+β,Var(X)=αβ(α+β)2(α+β+1)E[X] = \frac{\alpha}{\alpha + \beta}, \quad \text{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}

Here,

  • α,β\alpha, \beta=Shape parameters

Interpretation of Parameters

  • α\alpha can be thought of as "number of successes + 1" and β\beta as "number of failures + 1"
  • The prior sample size is α+β2\alpha + \beta - 2 (effective number of prior observations)
  • The prior mean is α/(α+β)\alpha/(\alpha+\beta); increasing α\alpha shifts mass toward 1, increasing β\beta shifts toward 0

Symmetry and Shape

Shape Properties

  • When α=β\alpha = \beta: the distribution is symmetric about 0.5
  • When α>β\alpha > \beta: left-skewed (mass concentrated near 1)
  • When α<β\alpha < \beta: right-skewed (mass concentrated near 0)
  • When α=β=1\alpha = \beta = 1: Beta(1,1)=Unif(0,1)\text{Beta}(1,1) = \text{Unif}(0,1) (uniform distribution)
  • When α,β>1\alpha, \beta > 1: unimodal with mode at (α1)/(α+β2)(\alpha-1)/(\alpha+\beta-2)
  • When α,β<1\alpha, \beta < 1: U-shaped (mass at 0 and 1)

The Conjugate Prior Property

ThBeta is Conjugate to the Binomial Likelihood

Setup: We observe nn Bernoulli trials with ss successes and f=nsf = n - s failures. The likelihood is:

L(ps,f)ps(1p)fL(p \mid s, f) \propto p^s (1-p)^f

If the prior is pBeta(α,β)p \sim \text{Beta}(\alpha, \beta), then the posterior is:

ps,fBeta(α+s,β+f)p \mid s, f \sim \text{Beta}(\alpha + s, \beta + f)

Proof:

π(ps,f)L(ps,f)π(p)=ps(1p)fpα1(1p)β1B(α,β)\pi(p \mid s, f) \propto L(p \mid s, f) \cdot \pi(p) = p^s(1-p)^f \cdot \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)}
p(α+s)1(1p)(β+f)1\propto p^{(\alpha+s)-1}(1-p)^{(\beta+f)-1}

This is the kernel of Beta(α+s,β+f)\text{Beta}(\alpha+s, \beta+f).

Bayesian Updating in Practice

Start with Beta(α,β)\text{Beta}(\alpha, \beta) prior. After observing data:

  • Posterior mean = α+sα+β+n\frac{\alpha + s}{\alpha + \beta + n} = weighted average of prior mean and sample proportion
  • Prior strength = α+β\alpha + \beta; data strength = nn
  • As nn \to \infty, posterior mean s/n\to s/n (data overwhelms prior)

MGF and Moments

MGF of Beta Distribution

MX(t)=1+k=1(α)k(α+β)ktkk!M_X(t) = 1 + \sum_{k=1}^{\infty} \frac{(\alpha)_k}{(\alpha+\beta)_k} \cdot \frac{t^k}{k!}

Here,

  • (α)k(\alpha)_k=Rising factorial: α(α+1)···(α+k-1)

No Simple Closed-Form MGF

Unlike the gamma distribution, the beta MGF does not simplify to a closed form. However, all moments are tractable via the rising factorial (Pochhammer symbol): E[Xk]=(α)k(α+β)kE[X^k] = \frac{(\alpha)_k}{(\alpha+\beta)_k}.


Connection to the F Distribution

ThBeta and F Distribution Relationship

If XBeta(α,β)X \sim \text{Beta}(\alpha, \beta), then:

βXα(1X)F(2α,2β)\frac{\beta X}{\alpha(1-X)} \sim F(2\alpha, 2\beta)

where F(2α,2β)F(2\alpha, 2\beta) is the F-distribution with 2α2\alpha and 2β2\beta degrees of freedom. This follows from the relationship between the Beta and F distributions through the chi-squared distribution.


Connection to the Binomial Distribution

ThBeta-Binomial Relationship

The beta distribution is the continuous relaxation of the discrete binomial. Specifically, if pBeta(α,β)p \sim \text{Beta}(\alpha, \beta) and nn is a positive integer, then:

P(X=k)=(nk)B(α+k,β+nk)B(α,β)P(X = k) = \binom{n}{k} \frac{B(\alpha + k, \beta + n - k)}{B(\alpha, \beta)}

is the beta-binomial distribution — a overdispersed binomial where pp is随机 drawn from a beta prior.


Worked Example

Example: A/B Testing with Bayesian Inversions

We test two website layouts. Layout A gets 120 clicks out of 500 visitors; Layout B gets 150 out of 500.

Prior: Beta(1,1)\text{Beta}(1, 1) (uniform, non-informative) for each.

Posteriors:

  • pAdataBeta(1+120,1+380)=Beta(121,381)p_A \mid \text{data} \sim \text{Beta}(1 + 120, 1 + 380) = \text{Beta}(121, 381)
  • pBdataBeta(1+150,1+350)=Beta(151,351)p_B \mid \text{data} \sim \text{Beta}(1 + 150, 1 + 350) = \text{Beta}(151, 351)

Posterior means:

  • p^A=121/5020.2410\hat{p}_A = 121/502 \approx 0.2410
  • p^B=151/5020.3008\hat{p}_B = 151/502 \approx 0.3008

Probability B > A: Compute P(pB>pA)P(p_B > p_A) via Monte Carlo simulation from both posteriors. With 100,000 samples, P(pB>pA)0.999P(p_B > p_A) \approx 0.999 — strong evidence that Layout B is better.


Specific Applications

  1. Bayesian A/B testing — Prior/posterior on click-through rates, conversion rates.
  2. Bayesian statistics — Conjugate prior for Bernoulli, binomial, and negative binomial likelihoods.
  3. Modeling rates and proportions — Prevalence rates, completion rates, success probabilities.
  4. Project management —PERT distributions use beta to model task completion percentages.

Key Takeaways

Summary: Beta Distribution

  • Defined on [0,1][0, 1]; ideal for modeling probabilities and proportions
  • PDF: f(x)=xα1(1x)β1/B(α,β)f(x) = x^{\alpha-1}(1-x)^{\beta-1}/B(\alpha,\beta)
  • Mean: α/(α+β)\alpha/(\alpha+\beta); symmetric when α=β\alpha = \beta; mode at (α1)/(α+β2)(\alpha-1)/(\alpha+\beta-2)
  • Conjugate prior for Bernoulli/Binomial: prior Beta(α,β)\text{Beta}(\alpha,\beta) + ss successes, ff failures \to posterior Beta(α+s,β+f)\text{Beta}(\alpha+s, \beta+f)
  • Beta function: B(α,β)=Γ(α)Γ(β)/Γ(α+β)B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)
  • Beta(1,1)=Unif(0,1)\text{Beta}(1,1) = \text{Unif}(0,1); connects to F distribution and beta-binomial

Premium Content

Beta Distribution — Modeling Probabilities and Proportions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement