Percentiles and Quartiles

Descriptive Statistics

Where Does Any Value Stand Relative to the Rest?

Percentiles tell you the relative standing of any value within a dataset. Quartiles are special percentiles that divide data into four equal parts.

Percentile rank — "You scored better than 85% of test takers"
Quartiles — Q1, Q2 (median), Q3 split data into four equal groups
Deciles — Ten equal groups for finer-grained comparison
Interpolation methods — Different calculators give slightly different answers; know why

Percentiles turn raw scores into meaningful rankings. They are the language of standardized testing and performance evaluation.

What are Percentiles and Quartiles?

Definition

The pth percentile is the value below which p% of observations fall. Quartiles (Q1=25th, Q2=50th, Q3=75th) are special cases.

Percentile Rank

\text{Percentile Rank} = \frac{\text{Number of values below } x}{n} \times 100

Here,

$x$ =The value being ranked
$n$ =Total number of observations

import numpy as np
from scipy import stats
import pandas as pd

data = np.array([15, 20, 35, 40, 50, 12, 27, 45, 38, 22, 18, 55, 30, 42, 25])
sorted_d = np.sort(data)
print(f"Sorted: {sorted_d}")

for p in [10, 25, 50, 75, 90]:
    print(f"P{p:2d}: {np.percentile(data, p):.2f}")

NumPy Interpolation Methods

d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
for method in ['linear', 'lower', 'higher', 'midpoint', 'nearest']:
    val = np.percentile(d, 50, interpolation=method)
    print(f"  method='{method}': {val}")

Interpolation Methods

NumPy supports multiple interpolation methods for percentiles: linear (default), lower, higher, midpoint, and nearest. The default 'linear' method is appropriate for most use cases.

Five-Number Summary

def five_num(data, label=''):
    q1, q2, q3 = np.percentile(data, [25, 50, 75])
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    if label: print(f"\n=== {label} ===")
    print(f"Min: {data.min():.2f}  Q1: {q1:.2f}  Median: {q2:.2f}  Q3: {q3:.2f}  Max: {data.max():.2f}")
    print(f"IQR: {iqr:.2f}  Fences: [{lower:.2f}, {upper:.2f}]")

np.random.seed(42)
exam = np.random.normal(75, 12, 200).clip(0, 100)
five_num(exam, "Exam Scores")

Quartile	Percentile	Description
Q1	25th	Lower quartile — 25% of data falls below this
Q2	50th	Median — middle value of the dataset
Q3	75th	Upper quartile — 75% of data falls below this

Percentile Rank

score = 88
rank = stats.percentileofscore(exam, score, kind='weak')
print(f"Score of {score} is at the {rank:.1f}th percentile")
print(f"{rank:.1f}% of students scored at or below {score}")

Deciles

deciles = np.percentile(exam, range(10, 100, 10))
for i, val in enumerate(deciles, 1):
    print(f"D{i} ({i*10}th pct): {val:.1f}")

Percentiles in Machine Learning

ML Application	Percentile Usage	Why
Quantile regression	Predict percentiles, not mean	Robust to skewed targets
Feature binning	Cut into quantile bins	Discretize continuous features
Performance metrics	P95 latency, P99 response time	SLA monitoring
Data preprocessing	Clip outliers at percentiles	Robust scaling

import numpy as np
from sklearn.preprocessing import QuantileTransformer

np.random.seed(42)

# Quantile binning for feature engineering
data = np.random.lognormal(3, 1, 1000)
bins = np.percentile(data, [0, 25, 50, 75, 100])
binned = np.digitize(data, bins[1:-1])
print(f"Quantile bins: {bins.round(1)}")
print(f"Binned values: {np.bincount(binned)}")

# QuantileTransformer for normality
qt = QuantileTransformer(n_quantiles=100, output_distribution='normal')
transformed = qt.fit_transform(data.reshape(-1,1)).flatten()
print(f"\nOriginal skewness: {float(np.mean(((data-data.mean())/data.std())**3)):.3f}")
print(f"Transformed skewness: {float(np.mean(((transformed-transformed.mean())/transformed.std())**3)):.3f}")

Key Takeaways

Summary: Percentiles and Quartiles

P50 = median — percentiles generalize the median to any fraction
Quartiles divide data into 4 equal-frequency groups (not equal-width intervals)
IQR = Q3 − Q1 covers the middle 50% and drives outlier fences
Percentile rank answers "where does this value fall in the distribution?"
NumPy's default method='linear' is appropriate for most cases
Percentiles are non-parametric — no distributional assumptions needed

Percentiles and Quartiles — Calculation and Interpretation

Percentiles and Quartiles

Where Does Any Value Stand Relative to the Rest?

What are Percentiles and Quartiles?

Definition

Percentile Rank

NumPy Interpolation Methods

Five-Number Summary

Percentile Rank

Deciles

Percentiles in Machine Learning

Key Takeaways

Summary: Percentiles and Quartiles

Premium Content

Need Expert Statistics Help?