Beta Distribution — Modeling Probabilities and Proportions
Foundations of Statistics
The Bayesian Workhorse for Probabilities
The beta distribution is the conjugate prior for binomial data, making it essential for Bayesian inference about proportions. Its flexibility on [0,1] makes it perfect for modeling uncertain probabilities and updating beliefs with data.
- A/B Testing — Updating conversion rate estimates as website test data accumulates
- Political Polling — Incorporating prior knowledge into election probability forecasts
- Quality Control — Modeling defect rates in manufacturing with Bayesian methods
The beta distribution turns prior knowledge into posterior certainty.
Core Concepts
The beta distribution is defined on and is the conjugate prior for the Bernoulli/Binomial likelihood in Bayesian inference. It provides a flexible family for modeling probabilities, proportions, and rates.
DfBeta Distribution
A continuous random variable has a beta distribution with shape parameters and if its pdf is:
Written . It naturally models probabilities and proportions.
Interactive Visualization
The Beta Function
DfThe Beta Function
The beta function is defined for by:
The second equality (connecting the beta and gamma functions) can be proved via the convolution of gamma distributions or through the Dirichlet integral.
ThProof that B(α, β) = Γ(α)Γ(β)/Γ(α+β)
Consider and independent. Let and .
The joint pdf of is derived via the Jacobian of the transformation :
Integrating out to get :
Thus and integrates to 1, confirming .
PDF of Beta Distribution
Here,
- =First shape parameter (successes + 1)
- =Second shape parameter (failures + 1)
- =Beta function (normalizing constant)
Mean, Variance, and Higher Moments
ThDerivation of E[X] and Var(X)
Using the identity :
Mean:
Second moment:
Variance:
Beta Mean and Variance
Here,
- =Shape parameters
Interpretation of Parameters
- can be thought of as "number of successes + 1" and as "number of failures + 1"
- The prior sample size is (effective number of prior observations)
- The prior mean is ; increasing shifts mass toward 1, increasing shifts toward 0
Symmetry and Shape
Shape Properties
- When : the distribution is symmetric about 0.5
- When : left-skewed (mass concentrated near 1)
- When : right-skewed (mass concentrated near 0)
- When : (uniform distribution)
- When : unimodal with mode at
- When : U-shaped (mass at 0 and 1)
The Conjugate Prior Property
ThBeta is Conjugate to the Binomial Likelihood
Setup: We observe Bernoulli trials with successes and failures. The likelihood is:
If the prior is , then the posterior is:
Proof:
This is the kernel of .
Bayesian Updating in Practice
Start with prior. After observing data:
- Posterior mean = = weighted average of prior mean and sample proportion
- Prior strength = ; data strength =
- As , posterior mean (data overwhelms prior)
MGF and Moments
MGF of Beta Distribution
Here,
- =Rising factorial: α(α+1)···(α+k-1)
No Simple Closed-Form MGF
Unlike the gamma distribution, the beta MGF does not simplify to a closed form. However, all moments are tractable via the rising factorial (Pochhammer symbol): .
Connection to the F Distribution
ThBeta and F Distribution Relationship
If , then:
where is the F-distribution with and degrees of freedom. This follows from the relationship between the Beta and F distributions through the chi-squared distribution.
Connection to the Binomial Distribution
ThBeta-Binomial Relationship
The beta distribution is the continuous relaxation of the discrete binomial. Specifically, if and is a positive integer, then:
is the beta-binomial distribution — a overdispersed binomial where is随机 drawn from a beta prior.
Worked Example
Example: A/B Testing with Bayesian Inversions
We test two website layouts. Layout A gets 120 clicks out of 500 visitors; Layout B gets 150 out of 500.
Prior: (uniform, non-informative) for each.
Posteriors:
Posterior means:
Probability B > A: Compute via Monte Carlo simulation from both posteriors. With 100,000 samples, — strong evidence that Layout B is better.
Specific Applications
- Bayesian A/B testing — Prior/posterior on click-through rates, conversion rates.
- Bayesian statistics — Conjugate prior for Bernoulli, binomial, and negative binomial likelihoods.
- Modeling rates and proportions — Prevalence rates, completion rates, success probabilities.
- Project management —PERT distributions use beta to model task completion percentages.
Key Takeaways
Summary: Beta Distribution
- Defined on ; ideal for modeling probabilities and proportions
- PDF:
- Mean: ; symmetric when ; mode at
- Conjugate prior for Bernoulli/Binomial: prior + successes, failures posterior
- Beta function:
- ; connects to F distribution and beta-binomial