🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Discrete Random Variables — PMF, CDF, and Properties

Foundations of StatisticsProbability Theory🟢 Free Lesson

Advertisement

Discrete Random Variables

Probability Theory

When Outcomes Are Numbers, Not Just Names

A discrete random variable takes on a countable number of distinct values, each with an associated probability. It transforms qualitative outcomes into quantitative analysis.

  • PMF — The probability mass function assigns a probability to each possible value
  • CDF — The cumulative distribution function gives P(X ≤ x) for any value
  • Expected value — The probability-weighted average of all possible outcomes
  • Variance — How much the random variable fluctuates around its expected value

Discrete random variables are the gateway from probability theory to statistical inference.


What is a Discrete Random Variable?

Definition

A discrete random variable takes on a countable number of distinct values, each with an associated probability. The probabilities of these values are described by the probability mass function (PMF).

Probability Mass Function (PMF)

P(X=xi)=piwhereipi=1andpi0P(X = x_i) = p_i \quad \text{where} \quad \sum_{i} p_i = 1 \quad \text{and} \quad p_i \geq 0

Here,

  • XX=The discrete random variable
  • xix_i=The i-th possible value
  • pip_i=Probability that X = x_i
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Example: fair die
values = np.array([1, 2, 3, 4, 5, 6])
probs = np.array([1/6, 1/6, 1/6, 1/6, 1/6, 1/6])

print("PMF of a Fair Die:")
for v, p in zip(values, probs):
    print(f"  P(X={v}) = {p:.4f}")
print(f"  Sum = {probs.sum():.4f}")

Cumulative Distribution Function (CDF)

CDF of Discrete RV

F(x)=P(Xx)=xixpiF(x) = P(X \leq x) = \sum_{x_i \leq x} p_i

Here,

  • F(x)F(x)=Cumulative probability up to x
  • pip_i=PMF at value x_i
# CDF of a fair die
cumulative_probs = np.cumsum(probs)
print("\nCDF of a Fair Die:")
for v, cp in zip(values, cumulative_probs):
    print(f"  F({v}) = P(X ≤ {v}) = {cp:.4f}")

Expected Value and Variance

Expected Value

E[X]=μ=ixipiE[X] = \mu = \sum_{i} x_i \cdot p_i

Here,

  • E[X]E[X]=Expected value (mean) of X
  • xix_i=Possible values
  • pip_i=Associated probabilities

Variance

Var(X)=σ2=i(xiμ)2pi\text{Var}(X) = \sigma^2 = \sum_{i} (x_i - \mu)^2 \cdot p_i

Here,

  • Var(X)\text{Var}(X)=Variance of X
  • μ\mu=Expected value of X
  • σ\sigma=Standard deviation
expected_value = np.sum(values * probs)
variance = np.sum((values - expected_value)**2 * probs)
std_dev = np.sqrt(variance)

print(f"\nE[X] = {expected_value:.4f}")
print(f"Var(X) = {variance:.4f}")
print(f"SD(X) = {std_dev:.4f}")

Common Discrete Distributions

fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Bernoulli
x_ber = np.array([0, 1])
p_ber = [0.3, 0.7]
axes[0, 0].bar(x_ber, p_ber, color='steelblue')
axes[0, 0].set_title('Bernoulli(p=0.7)')
axes[0, 0].set_xlabel('x')
axes[0, 0].set_ylabel('P(X=x)')

# Binomial
x_bin = np.arange(0, 11)
p_bin = stats.binom.pmf(x_bin, n=10, p=0.5)
axes[0, 1].bar(x_bin, p_bin, color='steelblue')
axes[0, 1].set_title('Binomial(n=10, p=0.5)')
axes[0, 1].set_xlabel('x')

# Poisson
x_poi = np.arange(0, 12)
p_poi = stats.poisson.pmf(x_poi, mu=3)
axes[1, 0].bar(x_poi, p_poi, color='steelblue')
axes[1, 0].set_title('Poisson(λ=3)')
axes[1, 0].set_xlabel('x')

# Geometric
x_geo = np.arange(1, 11)
p_geo = stats.geom.pmf(x_geo, p=0.3)
axes[1, 1].bar(x_geo, p_geo, color='steelblue')
axes[1, 1].set_title('Geometric(p=0.3)')
axes[1, 1].set_xlabel('x')

plt.tight_layout()
plt.savefig('discrete-distributions.png', dpi=150)
plt.show()
DistributionPMFMeanVariance
Bernoulli(p)px(1p)1xp^x(1-p)^{1-x}ppp(1p)p(1-p)
Binomial(n,p)(nx)px(1p)nx\binom{n}{x}p^x(1-p)^{n-x}npnpnp(1p)np(1-p)
Poisson(λ)eλλxx!\frac{e^{-\lambda}\lambda^x}{x!}λ\lambdaλ\lambda
Geometric(p)(1p)x1p(1-p)^{x-1}p1/p1/p(1p)/p2(1-p)/p^2

Discrete Random Variables in Machine Learning

ML ApplicationDiscrete Variable UsageWhy
ClassificationTarget class ~ discrete distributionSoftmax output
Generative modelsDiscrete tokens in LLMsAutoregressive generation
Markov chainsState transitionsNLP, speech recognition
Poisson processesEvent countsArrival time prediction
import numpy as np
from scipy.stats import poisson, binomial

# Classification output IS a discrete random variable
# Softmax gives probability distribution over discrete classes
logits = np.array([2.0, 1.0, 0.5, 0.1])  # raw model output
softmax = np.exp(logits) / np.sum(np.exp(logits))
print(f"Model logits: {logits}")
print(f"Softmax probabilities: {softmax.round(4)}")
print(f"Predicted class: {np.argmax(softmax)} (most likely discrete outcome)")

# Poisson for count prediction
# Number of customer arrivals per hour
lambda_param = 5  # average arrivals/hour
arrivals = poisson.rvs(lambda_param, size=10)
print(f"\nPoisson(λ={lambda_param}) arrivals: {arrivals}")
print(f"Mean: {arrivals.mean():.1f}, Std: {arrivals.std():.1f}")

Key Takeaways

Summary: Discrete Random Variables

  • PMF assigns probabilities to each possible value: P(X=xi)=piP(X = x_i) = p_i
  • CDF gives cumulative probability: F(x)=P(Xx)F(x) = P(X \leq x) — always non-decreasing
  • Expected value E[X]=xipiE[X] = \sum x_i \cdot p_i — the long-run average
  • Variance Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2 — measures spread
  • Common distributions: Bernoulli, Binomial, Poisson, Geometric
  • All probabilities sum to 1: pi=1\sum p_i = 1

Premium Content

Discrete Random Variables — PMF, CDF, and Properties

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement