🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Frequency Distributions — Tables, Relative Frequency, Cumulative

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Frequency Distributions

Descriptive Statistics

From Raw Numbers to Understandable Patterns

A frequency distribution organizes raw data into a table or chart showing how often each value occurs. It transforms an unreadable list of numbers into an interpretable summary.

  • Absolute frequency — Count observations in each category to see the raw totals
  • Relative frequency — Convert counts to proportions for comparison across datasets
  • Cumulative frequency — Track running totals to answer "at or below" questions
  • Grouped distributions — Handle continuous data by binning into intervals

Before you calculate any statistic, organize your data. Frequency distributions are the first step to understanding.


What is a Frequency Distribution?

Definition

A frequency distribution organizes raw data into a table or chart showing how often each value (or range of values) occurs. It transforms an unreadable list of numbers into an interpretable summary.


Absolute Frequency

The count of observations in each category or interval.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Example: Final exam scores of 50 students
np.random.seed(42)
scores = np.random.normal(72, 12, 50).clip(0, 100).round(0).astype(int)

# --- Categorical frequency table ---
from collections import Counter
freq = Counter(scores)
freq_df = pd.DataFrame({'Score': sorted(freq.keys()),
                         'Frequency': [freq[k] for k in sorted(freq.keys())]})
print("First 10 rows of frequency table:")
print(freq_df.head(10))

Grouped Frequency Distribution

When data is continuous or has many unique values, we group into class intervals (bins).

Steps to build:

  1. Find range = max − min
  2. Choose number of classes (typically 5–20; Sturges' rule: k = 1 + 3.322 × log₁₀(n))
  3. Class width = Range / k (round up)
  4. Create non-overlapping, equal-width intervals
  5. Count observations in each interval
# Sturges' rule for number of bins
n = len(scores)
k_sturges = int(np.ceil(1 + 3.322 * np.log10(n)))
print(f"Sturges' rule: k = {k_sturges} bins")

# Build grouped frequency table
min_score, max_score = scores.min(), scores.max()
bin_width = int(np.ceil((max_score - min_score) / k_sturges / 10) * 10)
bins = range(40, 105, 10)  # [40,50), [50,60), ...

labels = [f"{b}-{b+9}" for b in bins[:-1]]
score_series = pd.Series(scores)
grouped = pd.cut(score_series, bins=list(bins), right=False, labels=labels)

freq_table = (grouped.value_counts(sort=False)
              .reset_index()
              .rename(columns={'index': 'Interval', 'count': 'Frequency'}))
freq_table.columns = ['Interval', 'Frequency']

# Add relative and cumulative frequency
freq_table['Relative Freq'] = freq_table['Frequency'] / n
freq_table['Relative %'] = (freq_table['Relative Freq'] * 100).round(1)
freq_table['Cumulative Freq'] = freq_table['Frequency'].cumsum()
freq_table['Cumulative %'] = (freq_table['Cumulative Freq'] / n * 100).round(1)

print("\nGrouped Frequency Distribution:")
print(freq_table.to_string(index=False))

Output:

Architecture Diagram
Grouped Frequency Distribution:
 Interval  Frequency  Relative Freq  Relative %  Cumulative Freq  Cumulative %
    40-49          2          0.040         4.0                2           4.0
    50-59          7          0.140        14.0                9          18.0
    60-69         14          0.280        28.0               23          46.0
    70-79         16          0.320        32.0               39          78.0
    80-89         10          0.200        20.0               49          98.0
    90-99          1          0.020         2.0               50         100.0
Score Frequency Distribution
Ogive — Cumulative Frequency

Relative Frequency

Relative Frequency

Relative Frequency=fin\text{Relative Frequency} = \frac{f_i}{n}

Here,

  • fif_i=Frequency of the i-th class
  • nn=Total number of observations

Divides each frequency by total n. Useful for comparing distributions across datasets of different sizes.


Cumulative Frequency

Running total of frequencies. Answers: "What fraction of observations fall at or below this value?"

# Cumulative frequency chart (Ogive)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Histogram
axes[0].hist(scores, bins=list(bins), edgecolor='black', color='steelblue', alpha=0.7)
axes[0].set_title('Histogram\n(Absolute Frequency)')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Frequency')

# Relative frequency histogram
axes[1].hist(scores, bins=list(bins), density=True, edgecolor='black', color='coral', alpha=0.7)
axes[1].set_title('Relative Frequency Histogram')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Density')

# Cumulative frequency (Ogive)
sorted_scores = np.sort(scores)
cumulative = np.arange(1, n+1) / n
axes[2].plot(sorted_scores, cumulative, 'b-', linewidth=2)
axes[2].set_title('Ogive (Cumulative Freq)')
axes[2].set_xlabel('Score')
axes[2].set_ylabel('Cumulative Proportion')
axes[2].axhline(0.5, color='red', linestyle='--', label='Median')
axes[2].legend()

plt.tight_layout()
plt.savefig('frequency_distributions.png', dpi=150)
plt.show()

Frequency Distributions in Machine Learning

Class BalanceCheck target freqFeature DistSkewness checkBinningContinuous → discreteData QualityMissing valuesFrequency analysis is the FIRST step in every ML EDA pipeline

In ML, frequency distributions are critical for:

ML Use CaseFrequency TechniqueWhy It Matters
Classification targetsClass frequency tableDetect class imbalance → resample
Feature engineeringHistogram of featuresDecide binning for continuous variables
NLP tokenizationWord frequency (Zipf's law)Stop words, vocabulary pruning
Recommendation systemsUser-item frequencySparse matrix handling
Fraud detectionEvent frequencyExtreme class imbalance (99.9% legit)
import numpy as np
import pandas as pd

# ML example: class imbalance detection
np.random.seed(42)
n = 10000

# Simulated fraud dataset (99% legit, 1% fraud)
labels = np.random.choice(['legit', 'fraud'], n, p=[0.99, 0.01])

# Frequency distribution reveals the problem
from collections import Counter
freq = Counter(labels)
print("=== Class Frequency Distribution ===")
for cls, count in sorted(freq.items()):
    print(f"  {cls}: {count} ({count/n:.1%})")

# Fix: resample minority class
fraud_idx = np.where(labels == 'fraud')[0]
legit_idx = np.where(labels == 'legit')[0]
fraud_oversampled = np.random.choice(fraud_idx, size=len(legit_idx), replace=True)
balanced_labels = np.concatenate([labels[legit_idx], labels[fraud_oversampled]])
print(f"\nAfter oversampling: {Counter(balanced_labels)}")

Key Takeaways

Summary: Frequency Distributions

  1. Frequency distributions transform raw data into interpretable summaries
  2. Absolute frequency = counts; Relative frequency = proportions; Cumulative = running total
  3. Grouped distributions are needed for continuous data — bin width matters for interpretation
  4. Sturges' rule (k = 1 + 3.322 log₁₀n) is a starting point for number of bins
  5. The ogive (cumulative frequency curve) allows you to read off percentiles
  6. Different bin widths reveal different features — always try several widths

Premium Content

Frequency Distributions — Tables, Relative Frequency, Cumulative

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement