Five-Number Summary

Descriptive Statistics

Five Numbers That Describe Any Distribution

The five-number summary provides a compact non-parametric description of any dataset — no assumptions about shape required.

Minimum and Maximum — The boundaries of your data
Q1 and Q3 — The edges of the middle 50%
Median — The center that splits data exactly in half
Box plot foundation — These five numbers draw every box plot
IQR from Q1 and Q3 — The robust measure of spread comes directly from this summary

Five numbers. One complete picture. The five-number summary is the Swiss Army knife of descriptive statistics.

What is the Five-Number Summary?

Definition

The five-number summary consists of five descriptive statistics that divide a dataset into four equal parts: Minimum, Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile), and Maximum.

Statistic	Description
Minimum	Smallest non-outlier value
Q1	25th percentile (lower quartile)
Median	50th percentile
Q3	75th percentile (upper quartile)
Maximum	Largest non-outlier value

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')

print("=== Five-Number Summary: Total Bill ===")
bill = tips['total_bill']
q1, med, q3 = np.percentile(bill, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5*iqr
upper_fence = q3 + 1.5*iqr
not_outlier = bill[(bill >= lower_fence) & (bill <= upper_fence)]

print(f"Min (non-outlier): ${not_outlier.min():.2f}")
print(f"Q1:                ${q1:.2f}")
print(f"Median:            ${med:.2f}")
print(f"Q3:                ${q3:.2f}")
print(f"Max (non-outlier): ${not_outlier.max():.2f}")
print(f"IQR:               ${iqr:.2f}")
print(f"Lower fence:       ${lower_fence:.2f}")
print(f"Upper fence:       ${upper_fence:.2f}")
outliers = bill[(bill < lower_fence) | (bill > upper_fence)]
print(f"Outliers: {sorted(outliers.values)}")

IQR Outlier Fences

\text{Lower Fence} = Q_1 - 1.5 \times IQR \quad;\quad \text{Upper Fence} = Q_3 + 1.5 \times IQR

Here,

$Q_1$ =First quartile (25th percentile)
$Q_3$ =Third quartile (75th percentile)
$IQR$ =Interquartile range = Q3 - Q1

pandas describe() — Extended Summary

print(tips.describe().round(2))
# Shows: count, mean, std, min, Q1, Q2, Q3, max for all numeric columns

Comparing Groups with Five-Number Summaries

fig, ax = plt.subplots(figsize=(10, 5))
groups = tips.groupby('day')['total_bill']

for i, (day, group) in enumerate(groups):
    q1, med, q3 = np.percentile(group, [25, 50, 75])
    iqr = q3 - q1
    whisker_lo = group[group >= q1-1.5*iqr].min()
    whisker_hi = group[group <= q3+1.5*iqr].max()
    outliers = group[(group < whisker_lo) | (group > whisker_hi)]
    
    # Draw box
    ax.barh(i, q3-q1, left=q1, height=0.4, color='steelblue', alpha=0.7)
    ax.plot([med, med], [i-0.2, i+0.2], 'red', lw=2)
    ax.plot([whisker_lo, q1], [i, i], 'black', lw=1)
    ax.plot([q3, whisker_hi], [i, i], 'black', lw=1)
    ax.scatter(outliers, [i]*len(outliers), color='red', zorder=5, s=20)
    print(f"{day}: Min={whisker_lo:.1f} Q1={q1:.1f} Med={med:.1f} Q3={q3:.1f} Max={whisker_hi:.1f}")

ax.set_yticks(range(4))
ax.set_yticklabels(['Thursday','Friday','Saturday','Sunday'])
ax.set_xlabel('Total Bill ($)')
ax.set_title('Five-Number Summary: Total Bill by Day')
plt.tight_layout()
plt.savefig('five_num_summary.png', dpi=150)
plt.show()

Five-Number Summary in Machine Learning

ML Application	5-Number Usage	Why
EDA	Quick data understanding	First step before modeling
Box plots	Visual model diagnostics	Compare model errors
Data validation	Check for data issues	Pipeline monitoring
Feature profiling	Summary statistics per feature	Automated EDA reports

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Five-number summary for EDA
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

print("Five-Number Summary for Iris Dataset:")
for col in df.columns:
    q1, med, q3 = df[col].quantile([0.25, 0.5, 0.75])
    print(f"  {col:25s}: Min={df[col].min():.1f}, Q1={q1:.1f}, "
          f"Med={med:.1f}, Q3={q3:.1f}, Max={df[col].max():.1f}")

Key Takeaways

Summary: Five-Number Summary

Min, Q1, Median, Q3, Max define a box plot — the five-number summary IS a box plot
No distributional assumptions required — works for any shape
Outliers are defined by the 1.5×IQR rule, not by the min/max
pandas describe() adds mean and std to the five-number summary
Compare distributions across groups side by side with grouped five-number summaries
Skewness is visible: if median is closer to Q1, data is right-skewed; closer to Q3 -> left-skewed

Five-Number Summary — Box Plot Foundation