🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Five-Number Summary — Box Plot Foundation

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Five-Number Summary

Descriptive Statistics

Five Numbers That Describe Any Distribution

The five-number summary provides a compact non-parametric description of any dataset — no assumptions about shape required.

  • Minimum and Maximum — The boundaries of your data
  • Q1 and Q3 — The edges of the middle 50%
  • Median — The center that splits data exactly in half
  • Box plot foundation — These five numbers draw every box plot
  • IQR from Q1 and Q3 — The robust measure of spread comes directly from this summary

Five numbers. One complete picture. The five-number summary is the Swiss Army knife of descriptive statistics.


What is the Five-Number Summary?

Definition

The five-number summary consists of five descriptive statistics that divide a dataset into four equal parts: Minimum, Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile), and Maximum.

StatisticDescription
MinimumSmallest non-outlier value
Q125th percentile (lower quartile)
Median50th percentile
Q375th percentile (upper quartile)
MaximumLargest non-outlier value
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')

print("=== Five-Number Summary: Total Bill ===")
bill = tips['total_bill']
q1, med, q3 = np.percentile(bill, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5*iqr
upper_fence = q3 + 1.5*iqr
not_outlier = bill[(bill >= lower_fence) & (bill <= upper_fence)]

print(f"Min (non-outlier): ${not_outlier.min():.2f}")
print(f"Q1:                ${q1:.2f}")
print(f"Median:            ${med:.2f}")
print(f"Q3:                ${q3:.2f}")
print(f"Max (non-outlier): ${not_outlier.max():.2f}")
print(f"IQR:               ${iqr:.2f}")
print(f"Lower fence:       ${lower_fence:.2f}")
print(f"Upper fence:       ${upper_fence:.2f}")
outliers = bill[(bill < lower_fence) | (bill > upper_fence)]
print(f"Outliers: {sorted(outliers.values)}")

IQR Outlier Fences

Lower Fence=Q11.5×IQR;Upper Fence=Q3+1.5×IQR\text{Lower Fence} = Q_1 - 1.5 \times IQR \quad;\quad \text{Upper Fence} = Q_3 + 1.5 \times IQR

Here,

  • Q1Q_1=First quartile (25th percentile)
  • Q3Q_3=Third quartile (75th percentile)
  • IQRIQR=Interquartile range = Q3 - Q1

pandas describe() — Extended Summary

print(tips.describe().round(2))
# Shows: count, mean, std, min, Q1, Q2, Q3, max for all numeric columns

Comparing Groups with Five-Number Summaries

fig, ax = plt.subplots(figsize=(10, 5))
groups = tips.groupby('day')['total_bill']

for i, (day, group) in enumerate(groups):
    q1, med, q3 = np.percentile(group, [25, 50, 75])
    iqr = q3 - q1
    whisker_lo = group[group >= q1-1.5*iqr].min()
    whisker_hi = group[group <= q3+1.5*iqr].max()
    outliers = group[(group < whisker_lo) | (group > whisker_hi)]
    
    # Draw box
    ax.barh(i, q3-q1, left=q1, height=0.4, color='steelblue', alpha=0.7)
    ax.plot([med, med], [i-0.2, i+0.2], 'red', lw=2)
    ax.plot([whisker_lo, q1], [i, i], 'black', lw=1)
    ax.plot([q3, whisker_hi], [i, i], 'black', lw=1)
    ax.scatter(outliers, [i]*len(outliers), color='red', zorder=5, s=20)
    print(f"{day}: Min={whisker_lo:.1f} Q1={q1:.1f} Med={med:.1f} Q3={q3:.1f} Max={whisker_hi:.1f}")

ax.set_yticks(range(4))
ax.set_yticklabels(['Thursday','Friday','Saturday','Sunday'])
ax.set_xlabel('Total Bill ($)')
ax.set_title('Five-Number Summary: Total Bill by Day')
plt.tight_layout()
plt.savefig('five_num_summary.png', dpi=150)
plt.show()

Five-Number Summary in Machine Learning

ML Application5-Number UsageWhy
EDAQuick data understandingFirst step before modeling
Box plotsVisual model diagnosticsCompare model errors
Data validationCheck for data issuesPipeline monitoring
Feature profilingSummary statistics per featureAutomated EDA reports
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Five-number summary for EDA
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

print("Five-Number Summary for Iris Dataset:")
for col in df.columns:
    q1, med, q3 = df[col].quantile([0.25, 0.5, 0.75])
    print(f"  {col:25s}: Min={df[col].min():.1f}, Q1={q1:.1f}, "
          f"Med={med:.1f}, Q3={q3:.1f}, Max={df[col].max():.1f}")

Key Takeaways

Summary: Five-Number Summary

  • Min, Q1, Median, Q3, Max define a box plot — the five-number summary IS a box plot
  • No distributional assumptions required — works for any shape
  • Outliers are defined by the 1.5×IQR rule, not by the min/max
  • pandas describe() adds mean and std to the five-number summary
  • Compare distributions across groups side by side with grouped five-number summaries
  • Skewness is visible: if median is closer to Q1, data is right-skewed; closer to Q3 -> left-skewed

Premium Content

Five-Number Summary — Box Plot Foundation

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement