Relative Frequency

Descriptive Statistics

From Counts to Proportions — The Bridge to Probability

Relative frequency converts raw counts into proportions, revealing how often each category occurs relative to the whole. It is the empirical bridge between data and probability.

Probability estimation — Use observed proportions as estimates of true probabilities
Cross-dataset comparison — Compare distributions of different sizes on equal footing
Law of Large Numbers — Relative frequency converges to true probability as n grows
Foundation for histograms — Density histograms use relative frequency on the y-axis

When you divide every count by the total, you unlock the connection between data and probability.

What is Relative Frequency?

Definition

Relative frequency is the proportion (or percentage) of times a value occurs in a dataset compared to the total number of observations. It estimates the probability of that category.

Relative Frequency Formula

\text{Relative Frequency} = \frac{\text{Frequency of category}}{\text{Total number of observations}}

Here,

$f_i$ =Frequency of the i-th category
$n$ =Total number of observations
$\sum f_i / n$ =Sum of all relative frequencies = 1

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate sample data
colors = np.random.choice(['Red', 'Blue', 'Green', 'Yellow'], size=100, p=[0.3, 0.25, 0.25, 0.2])

# Compute relative frequencies
value_counts = pd.Series(colors).value_counts()
relative_freq = value_counts / len(colors)

print("Absolute and Relative Frequencies:")
print(pd.DataFrame({
    'Count': value_counts,
    'Relative Frequency': relative_freq.round(4),
    'Percentage': (relative_freq * 100).round(1).astype(str) + '%'
}))

Cumulative Relative Frequency

# Cumulative relative frequency
cumulative_freq = relative_freq.cumsum()
print("\nCumulative Relative Frequency:")
print(cumulative_freq.round(4))

Cumulative Relative Frequency

F(x_k) = \sum_{i=1}^{k} \frac{f_i}{n}

Here,

$F(x_k)$ =Cumulative relative frequency up to category k
$f_i$ =Frequency of the i-th category
$n$ =Total number of observations

Visualization

Relative Frequency Distribution

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Bar chart of relative frequencies
relative_freq.plot(kind='bar', color=['#e74c3c', '#3498db', '#2ecc71', '#f39c12'], ax=axes[0])
axes[0].set_title('Relative Frequency Distribution')
axes[0].set_ylabel('Relative Frequency')
axes[0].set_ylim(0, 0.4)

# Frequency polygon
relative_freq.plot(kind='line', marker='o', ax=axes[1])
axes[1].set_title('Frequency Polygon')
axes[1].set_ylabel('Relative Frequency')

plt.tight_layout()
plt.savefig('relative-frequency.png', dpi=150)
plt.show()

Probability Estimation from Data

# Using relative frequency as probability estimate
print("Probability Estimates:")
for color, freq in relative_freq.items():
    print(f"  P({color}) ≈ {freq:.4f}")

# Verify sum equals 1
print(f"\nSum of probabilities: {relative_freq.sum():.4f}")

Law of Large Numbers

As the sample size increases, the relative frequency of an event converges to its true probability. This is the foundation of the frequentist interpretation of probability.