Beta Distribution — Modeling Probabilities and Proportions

Foundations of Statistics

The Bayesian Workhorse for Probabilities

The beta distribution is the conjugate prior for binomial data, making it essential for Bayesian inference about proportions. Its flexibility on [0,1] makes it perfect for modeling uncertain probabilities and updating beliefs with data.

A/B Testing — Updating conversion rate estimates as website test data accumulates
Political Polling — Incorporating prior knowledge into election probability forecasts
Quality Control — Modeling defect rates in manufacturing with Bayesian methods

The beta distribution turns prior knowledge into posterior certainty.

Core Concepts

The beta distribution is defined on $[0, 1]$ and is the conjugate prior for the Bernoulli/Binomial likelihood in Bayesian inference. It provides a flexible family for modeling probabilities, proportions, and rates.

DfBeta Distribution

A continuous random variable $X$ has a beta distribution with shape parameters $\alpha > 0$ and $\beta > 0$ if its pdf is:

f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Written $X \sim \text{Beta}(\alpha, \beta)$ . It naturally models probabilities and proportions.

Interactive Visualization

Beta Distribution — Interactive Explorer

Mean (μ) = 0.2857

Shape Flexibility of the Beta Distribution

Mean (μ) = 0.2857

The Beta Function

DfThe Beta Function

The beta function is defined for $\alpha, \beta > 0$ by:

B(\alpha, \beta) = \int_0^1 t^{\alpha-1}(1-t)^{\beta-1} \, dt = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}

The second equality (connecting the beta and gamma functions) can be proved via the convolution of gamma distributions or through the Dirichlet integral.

ThProof that B(α, β) = Γ(α)Γ(β)/Γ(α+β)

Consider $X \sim \text{Gamma}(\alpha, 1)$ and $Y \sim \text{Gamma}(\beta, 1)$ independent. Let $U = X/(X+Y)$ and $V = X+Y$ .

The joint pdf of $(U, V)$ is derived via the Jacobian of the transformation $(x,y) = (uv, v(1-u))$ :

f_{U,V}(u,v) = f_X(uv) f_Y(v(1-u)) \cdot |J| = \frac{(uv)^{\alpha-1} e^{-uv}}{\Gamma(\alpha)} \cdot \frac{(v(1-u))^{\beta-1} e^{-v(1-u)}}{\Gamma(\beta)} \cdot v

= \frac{u^{\alpha-1}(1-u)^{\beta-1} v^{\alpha+\beta-1} e^{-v}}{\Gamma(\alpha)\Gamma(\beta)}

Integrating out $v$ to get $f_U(u)$ :

f_U(u) = \frac{u^{\alpha-1}(1-u)^{\beta-1}}{\Gamma(\alpha)\Gamma(\beta)} \int_0^\infty v^{\alpha+\beta-1} e^{-v} dv = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} u^{\alpha-1}(1-u)^{\beta-1}

Thus $U \sim \text{Beta}(\alpha, \beta)$ and $f_U$ integrates to 1, confirming $B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)$ .

PDF of Beta Distribution

f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Here,

$\alpha$ =First shape parameter (successes + 1)
$\beta$ =Second shape parameter (failures + 1)
$B(\alpha, \beta)$ =Beta function (normalizing constant)

Mean, Variance, and Higher Moments

ThDerivation of E[X] and Var(X)

Using the identity $B(\alpha+1, \beta) = \frac{\alpha}{\alpha+\beta} B(\alpha, \beta)$ :

Mean:

E[X] = \int_0^1 x \cdot \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)} dx = \frac{B(\alpha+1, \beta)}{B(\alpha, \beta)} = \frac{\alpha}{\alpha + \beta}

Second moment:

E[X^2] = \frac{B(\alpha+2, \beta)}{B(\alpha, \beta)} = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)}

Variance:

\text{Var}(X) = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)} - \frac{\alpha^2}{(\alpha+\beta)^2} = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}

Beta Mean and Variance

E[X] = \frac{\alpha}{\alpha + \beta}, \quad \text{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}

Here,

$\alpha, \beta$ =Shape parameters

Interpretation of Parameters

$\alpha$ can be thought of as "number of successes + 1" and $\beta$ as "number of failures + 1"
The prior sample size is $\alpha + \beta - 2$ (effective number of prior observations)
The prior mean is $\alpha/(\alpha+\beta)$ ; increasing $\alpha$ shifts mass toward 1, increasing $\beta$ shifts toward 0

Symmetry and Shape

Shape Properties

When $\alpha = \beta$ : the distribution is symmetric about 0.5
When $\alpha > \beta$ : left-skewed (mass concentrated near 1)
When $\alpha < \beta$ : right-skewed (mass concentrated near 0)
When $\alpha = \beta = 1$ : $\text{Beta}(1,1) = \text{Unif}(0,1)$ (uniform distribution)
When $\alpha, \beta > 1$ : unimodal with mode at $(\alpha-1)/(\alpha+\beta-2)$
When $\alpha, \beta < 1$ : U-shaped (mass at 0 and 1)

The Conjugate Prior Property

ThBeta is Conjugate to the Binomial Likelihood

Setup: We observe $n$ Bernoulli trials with $s$ successes and $f = n - s$ failures. The likelihood is:

L(p \mid s, f) \propto p^s (1-p)^f

If the prior is $p \sim \text{Beta}(\alpha, \beta)$ , then the posterior is:

p \mid s, f \sim \text{Beta}(\alpha + s, \beta + f)

Proof:

\pi(p \mid s, f) \propto L(p \mid s, f) \cdot \pi(p) = p^s(1-p)^f \cdot \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)}

\propto p^{(\alpha+s)-1}(1-p)^{(\beta+f)-1}

This is the kernel of $\text{Beta}(\alpha+s, \beta+f)$ .

Bayesian Updating in Practice

Start with $\text{Beta}(\alpha, \beta)$ prior. After observing data:

Posterior mean = $\frac{\alpha + s}{\alpha + \beta + n}$ = weighted average of prior mean and sample proportion
Prior strength = $\alpha + \beta$ ; data strength = $n$
As $n \to \infty$ , posterior mean $\to s/n$ (data overwhelms prior)

MGF and Moments

MGF of Beta Distribution

M_X(t) = 1 + \sum_{k=1}^{\infty} \frac{(\alpha)_k}{(\alpha+\beta)_k} \cdot \frac{t^k}{k!}

Here,

$(\alpha)_k$ =Rising factorial: α(α+1)···(α+k-1)

No Simple Closed-Form MGF

Unlike the gamma distribution, the beta MGF does not simplify to a closed form. However, all moments are tractable via the rising factorial (Pochhammer symbol): $E[X^k] = \frac{(\alpha)_k}{(\alpha+\beta)_k}$ .

Connection to the F Distribution

ThBeta and F Distribution Relationship

If $X \sim \text{Beta}(\alpha, \beta)$ , then:

\frac{\beta X}{\alpha(1-X)} \sim F(2\alpha, 2\beta)

where $F(2\alpha, 2\beta)$ is the F-distribution with $2\alpha$ and $2\beta$ degrees of freedom. This follows from the relationship between the Beta and F distributions through the chi-squared distribution.

Connection to the Binomial Distribution

ThBeta-Binomial Relationship

The beta distribution is the continuous relaxation of the discrete binomial. Specifically, if $p \sim \text{Beta}(\alpha, \beta)$ and $n$ is a positive integer, then:

P(X = k) = \binom{n}{k} \frac{B(\alpha + k, \beta + n - k)}{B(\alpha, \beta)}

is the beta-binomial distribution — a overdispersed binomial where $p$ is随机 drawn from a beta prior.

Worked Example

Example: A/B Testing with Bayesian Inversions

We test two website layouts. Layout A gets 120 clicks out of 500 visitors; Layout B gets 150 out of 500.

Prior: $\text{Beta}(1, 1)$ (uniform, non-informative) for each.

Posteriors:

$p_A \mid \text{data} \sim \text{Beta}(1 + 120, 1 + 380) = \text{Beta}(121, 381)$
$p_B \mid \text{data} \sim \text{Beta}(1 + 150, 1 + 350) = \text{Beta}(151, 351)$

Posterior means:

$\hat{p}_A = 121/502 \approx 0.2410$
$\hat{p}_B = 151/502 \approx 0.3008$

Probability B > A: Compute $P(p_B > p_A)$ via Monte Carlo simulation from both posteriors. With 100,000 samples, $P(p_B > p_A) \approx 0.999$ — strong evidence that Layout B is better.

Specific Applications

Bayesian A/B testing — Prior/posterior on click-through rates, conversion rates.
Bayesian statistics — Conjugate prior for Bernoulli, binomial, and negative binomial likelihoods.
Modeling rates and proportions — Prevalence rates, completion rates, success probabilities.
Project management —PERT distributions use beta to model task completion percentages.

Key Takeaways

Summary: Beta Distribution

Defined on $[0, 1]$ ; ideal for modeling probabilities and proportions
PDF: $f(x) = x^{\alpha-1}(1-x)^{\beta-1}/B(\alpha,\beta)$
Mean: $\alpha/(\alpha+\beta)$ ; symmetric when $\alpha = \beta$ ; mode at $(\alpha-1)/(\alpha+\beta-2)$
Conjugate prior for Bernoulli/Binomial: prior $\text{Beta}(\alpha,\beta)$ + $s$ successes, $f$ failures $\to$ posterior $\text{Beta}(\alpha+s, \beta+f)$
Beta function: $B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)$
$\text{Beta}(1,1) = \text{Unif}(0,1)$ ; connects to F distribution and beta-binomial

Beta Distribution — Modeling Probabilities and Proportions

Beta Distribution — Modeling Probabilities and Proportions

The Bayesian Workhorse for Probabilities

Core Concepts

DfBeta Distribution

Interactive Visualization

The Beta Function

DfThe Beta Function

ThProof that B(α, β) = Γ(α)Γ(β)/Γ(α+β)

PDF of Beta Distribution

Mean, Variance, and Higher Moments

ThDerivation of E[X] and Var(X)

Beta Mean and Variance

Symmetry and Shape

The Conjugate Prior Property

ThBeta is Conjugate to the Binomial Likelihood

MGF and Moments

MGF of Beta Distribution

Connection to the F Distribution

ThBeta and F Distribution Relationship

Connection to the Binomial Distribution

ThBeta-Binomial Relationship

Worked Example

Specific Applications

Key Takeaways

Summary: Beta Distribution

Premium Content

Need Expert Statistics Help?