🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Relative Frequency — Proportions and Probability Estimation

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Relative Frequency

Descriptive Statistics

From Counts to Proportions — The Bridge to Probability

Relative frequency converts raw counts into proportions, revealing how often each category occurs relative to the whole. It is the empirical bridge between data and probability.

  • Probability estimation — Use observed proportions as estimates of true probabilities
  • Cross-dataset comparison — Compare distributions of different sizes on equal footing
  • Law of Large Numbers — Relative frequency converges to true probability as n grows
  • Foundation for histograms — Density histograms use relative frequency on the y-axis

When you divide every count by the total, you unlock the connection between data and probability.


What is Relative Frequency?

Definition

Relative frequency is the proportion (or percentage) of times a value occurs in a dataset compared to the total number of observations. It estimates the probability of that category.

Relative Frequency Formula

Relative Frequency=Frequency of categoryTotal number of observations\text{Relative Frequency} = \frac{\text{Frequency of category}}{\text{Total number of observations}}

Here,

  • fif_i=Frequency of the i-th category
  • nn=Total number of observations
  • fi/n\sum f_i / n=Sum of all relative frequencies = 1
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate sample data
colors = np.random.choice(['Red', 'Blue', 'Green', 'Yellow'], size=100, p=[0.3, 0.25, 0.25, 0.2])

# Compute relative frequencies
value_counts = pd.Series(colors).value_counts()
relative_freq = value_counts / len(colors)

print("Absolute and Relative Frequencies:")
print(pd.DataFrame({
    'Count': value_counts,
    'Relative Frequency': relative_freq.round(4),
    'Percentage': (relative_freq * 100).round(1).astype(str) + '%'
}))

Cumulative Relative Frequency

# Cumulative relative frequency
cumulative_freq = relative_freq.cumsum()
print("\nCumulative Relative Frequency:")
print(cumulative_freq.round(4))

Cumulative Relative Frequency

F(xk)=i=1kfinF(x_k) = \sum_{i=1}^{k} \frac{f_i}{n}

Here,

  • F(xk)F(x_k)=Cumulative relative frequency up to category k
  • fif_i=Frequency of the i-th category
  • nn=Total number of observations

Visualization

Relative Frequency Distribution
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Bar chart of relative frequencies
relative_freq.plot(kind='bar', color=['#e74c3c', '#3498db', '#2ecc71', '#f39c12'], ax=axes[0])
axes[0].set_title('Relative Frequency Distribution')
axes[0].set_ylabel('Relative Frequency')
axes[0].set_ylim(0, 0.4)

# Frequency polygon
relative_freq.plot(kind='line', marker='o', ax=axes[1])
axes[1].set_title('Frequency Polygon')
axes[1].set_ylabel('Relative Frequency')

plt.tight_layout()
plt.savefig('relative-frequency.png', dpi=150)
plt.show()

Probability Estimation from Data

# Using relative frequency as probability estimate
print("Probability Estimates:")
for color, freq in relative_freq.items():
    print(f"  P({color}) ≈ {freq:.4f}")

# Verify sum equals 1
print(f"\nSum of probabilities: {relative_freq.sum():.4f}")

Law of Large Numbers

As the sample size increases, the relative frequency of an event converges to its true probability. This is the foundation of the frequentist interpretation of probability.


Relative Frequency in Machine Learning

ML ApplicationRelative Freq UsageWhy
Class balanceCheck target distributionDetect imbalance
NLPWord frequency (Zipf's law)Tokenization strategy
Feature engineeringFrequency encodingReplace categories with freq
Data validationExpected vs observed proportionsDetect data drift
import numpy as np
import pandas as pd

np.random.seed(42)

# Class imbalance detection
y = np.random.choice(['fraud', 'legit'], 10000, p=[0.02, 0.98])
freq = pd.Series(y).value_counts(normalize=True)
print("Relative frequency (class balance):")
print(freq.round(4))
print(f"\nFraud rate: {freq['fraud']:.2%} — extreme imbalance!")
print("Solutions: SMOTE, class weights, undersampling")

# Frequency encoding
categories = np.random.choice(['A', 'B', 'C', 'D'], 1000, p=[0.5, 0.3, 0.15, 0.05])
freq_map = pd.Series(categories).value_counts(normalize=True).to_dict()
encoded = [freq_map[c] for c in categories]
print(f"\nFrequency encoding: {freq_map}")

Key Takeaways

Summary: Relative Frequency

  • Relative frequency = count / total — converts counts to proportions
  • Sum of all relative frequencies = 1 — represents the entire sample
  • Cumulative relative frequency shows the running total of proportions
  • Relative frequency estimates probability — more data -> better estimates
  • pandas value_counts(normalize=True) computes relative frequencies directly
  • Frequency polygons visualize the shape of the distribution

Premium Content

Relative Frequency — Proportions and Probability Estimation

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement