🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Cumulative Frequency — Ogives and Percentile Estimation

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Cumulative Frequency

Descriptive Statistics

Running Totals That Unlock Percentiles

Cumulative frequency is the running total of frequencies up to a given value, showing how many observations fall at or below each point.

  • Percentile estimation — Read off any percentile directly from the cumulative curve
  • Ogive plots — The cumulative frequency polygon visualizes the entire distribution
  • ECDF — The empirical cumulative distribution function is the non-parametric version
  • Median and quartiles — Find the 50th percentile (median) by reading the ogive at 50%

The cumulative frequency curve is the most direct path from raw data to percentiles.


What is Cumulative Frequency?

Definition

Cumulative frequency is the running total of frequencies up to a given value, showing how many observations fall at or below each point. It is used to determine how many observations fall below or at a particular value.

Cumulative Frequency

CF(xk)=i=1kfiCF(x_k) = \sum_{i=1}^{k} f_i

Here,

  • fif_i=Frequency of the i-th value
  • xkx_k=The k-th value (sorted)
  • CF(xk)CF(x_k)=Cumulative frequency up to x_k
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate test scores
scores = np.random.normal(75, 12, 200).clip(0, 100).astype(int)

# Compute frequency distribution
bins = np.arange(0, 105, 5)
freq, edges = np.histogram(scores, bins=bins)
cum_freq = np.cumsum(freq)

df = pd.DataFrame({
    'Class Interval': [f'{edges[i]}-{edges[i+1]}' for i in range(len(freq))],
    'Frequency': freq,
    'Cumulative Frequency': cum_freq,
    'Cumulative Relative Frequency': (cum_freq / cum_freq[-1]).round(4)
})
print(df.head(10))

Ogive Plot (Cumulative Frequency Graph)

Ogive — Cumulative Frequency Graph
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Ogive
axes[0].plot(edges[1:], cum_freq, marker='o', linewidth=2)
axes[0].fill_between(edges[1:], cum_freq, alpha=0.3)
axes[0].set_title('Ogive (Cumulative Frequency Graph)')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Cumulative Frequency')
axes[0].grid(True, alpha=0.3)

# ECDF (Empirical CDF)
sorted_scores = np.sort(scores)
ecdf = np.arange(1, len(sorted_scores) + 1) / len(sorted_scores)
axes[1].step(sorted_scores, ecdf, linewidth=2)
axes[1].set_title('Empirical CDF (ECDF)')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Cumulative Probability')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('cumulative-frequency.png', dpi=150)
plt.show()

Percentile Estimation from Ogive

# Estimate median (50th percentile) from cumulative frequency
target_freq = 0.5 * cum_freq[-1]
median_idx = np.searchsorted(cum_freq, target_freq)
estimated_median = edges[1:][median_idx]
print(f"Estimated median from ogive: {estimated_median}")
print(f"Actual median: {np.median(scores)}")

# Estimate quartiles
for q, name in [(0.25, 'Q1'), (0.50, 'Median'), (0.75, 'Q3')]:
    target = q * cum_freq[-1]
    idx = np.searchsorted(cum_freq, target)
    est = edges[1:][idx]
    print(f"{name}: Estimated = {est}, Actual = {np.percentile(scores, q*100):.1f}")

Reading an Ogive

To find a percentile from an ogive: (1) locate the desired percentile on the y-axis, (2) draw a horizontal line to the curve, (3) drop a vertical line to the x-axis, (4) read the value.


Cumulative Frequency in Machine Learning

ML ApplicationCumulative Freq UsageWhy
ROC curvesCumulative TPR vs FPRModel threshold selection
CalibrationPredicted vs observed cumulativeReliability diagrams
Survival analysisKaplan-Meier curvesTime-to-event prediction
import numpy as np
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_samples=500, random_state=42)
model = LogisticRegression(random_state=42).fit(X, y)
y_proba = model.predict_proba(X)[:, 1]

fpr, tpr, thresholds = roc_curve(y, y_proba)
roc_auc = auc(fpr, tpr)
print(f"AUC-ROC: {roc_auc:.3f}")
print(f"TPR at FPR=0.1: {tpr[np.searchsorted(fpr, 0.1)]:.3f}")
print("ROC curve is a cumulative frequency diagram of TPR vs FPR")

Key Takeaways

Summary: Cumulative Frequency

  • Cumulative frequency = running total of frequencies from lowest to highest
  • Ogive plots show cumulative frequency as a graph — used to estimate percentiles
  • ECDF (Empirical CDF) plots cumulative proportion (0 to 1) — equivalent to the ogive
  • Percentile estimation: find the desired percentage on the y-axis, read the corresponding x-value
  • Median = value where cumulative frequency reaches 50%
  • Ogives are non-decreasing — they never go down as you move right

Premium Content

Cumulative Frequency — Ogives and Percentile Estimation

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement