Range and IQR
Descriptive Statistics
The Simplest Measures of How Spread Out Your Data Is
Measures of spread tell us how scattered the data is. Range and IQR are the simplest measures — they use only specific order statistics.
- Range — The difference between max and min; simple but brutally sensitive to outliers
- IQR — The middle 50% of data; robust and reliable for skewed distributions
- Outlier detection — The 1.5 times IQR rule flags suspicious values automatically
- Box plot foundation — The IQR forms the box in every box plot you will ever make
Spread matters as much as center. Two datasets with the same mean can behave very differently.
What are Range and IQR?
Definition
Measures of spread (dispersion) tell us how scattered the data is. Range and IQR are the simplest measures — they use only specific order statistics.
Range
Range
Here,
- =Maximum value in the dataset
- =Minimum value in the dataset
Simple but highly sensitive to outliers — one extreme value changes it completely.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
data = np.array([12, 15, 14, 10, 18, 20, 16, 11, 13, 17])
data_with_outlier = np.append(data, 100)
print(f"Data: {sorted(data)}")
print(f"Range = {data.max()} - {data.min()} = {data.max() - data.min()}")
print(f"\nWith outlier (100 added):")
print(f"Range = {data_with_outlier.max()} - {data_with_outlier.min()} = {data_with_outlier.max() - data_with_outlier.min()}")
print("Range nearly quadrupled due to one outlier!")
Interquartile Range (IQR)
Interquartile Range
Here,
- =Third quartile (75th percentile)
- =First quartile (25th percentile)
The range of the middle 50% of the data. Robust to outliers.
# Computing quartiles and IQR
def five_number_summary(data):
q1, q2, q3 = np.percentile(data, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5 * iqr
upper_fence = q3 + 1.5 * iqr
print(f"Min: {data.min():.2f}")
print(f"Q1: {q1:.2f}")
print(f"Median: {q2:.2f}")
print(f"Q3: {q3:.2f}")
print(f"Max: {data.max():.2f}")
print(f"IQR: {iqr:.2f}")
print(f"Lower fence (Q1 - 1.5×IQR): {lower_fence:.2f}")
print(f"Upper fence (Q3 + 1.5×IQR): {upper_fence:.2f}")
return q1, q2, q3, iqr
print("=== Normal data ===")
five_number_summary(data)
print("\n=== Data with outlier ===")
five_number_summary(data_with_outlier)
print("IQR barely changed — robust!")
Visualizing Range and IQR
# Two datasets with same mean and range but different IQR
np.random.seed(0)
dataset_a = np.random.uniform(0, 100, 200) # Uniform: large IQR
dataset_b = np.random.normal(50, 10, 200) # Normal: smaller IQR
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, data, label, color in zip(axes,
[dataset_a, dataset_b],
['Uniform', 'Normal'],
['steelblue', 'coral']):
ax.hist(data, bins=30, color=color, edgecolor='black', alpha=0.7, density=True)
q1, q2, q3 = np.percentile(data, [25, 50, 75])
ax.axvline(data.min(), color='gray', linestyle=':', label=f'Min={data.min():.0f}')
ax.axvline(q1, color='blue', linestyle='--', label=f'Q1={q1:.0f}')
ax.axvline(q2, color='red', linestyle='-', linewidth=2, label=f'Median={q2:.0f}')
ax.axvline(q3, color='blue', linestyle='--', label=f'Q3={q3:.0f}')
ax.axvline(data.max(), color='gray', linestyle=':', label=f'Max={data.max():.0f}')
ax.fill_betweenx([0, ax.get_ylim()[1] if ax.get_ylim()[1] > 0 else 0.05],
q1, q3, alpha=0.2, color='yellow', label=f'IQR={q3-q1:.0f}')
ax.set_title(f'{label} Distribution\nRange={data.max()-data.min():.0f}, IQR={q3-q1:.0f}')
ax.legend(fontsize=7)
plt.tight_layout()
plt.savefig('range_iqr.png', dpi=150)
plt.show()
Comparing Spread Measures
| Measure | Formula | Breakdown Point | Sensitive To |
|---|---|---|---|
| Range | Max - Min | 0% | Very sensitive to outliers |
| IQR | Q3 - Q1 | 25% | Robust |
| Std Dev | √(Σ(xᵢ-x̄)²/(n-1)) | 0% | Sensitive to outliers |
| MAD | Median( | xᵢ - Median | ) |
Range and IQR in Machine Learning
| ML Application | Range/IQR Usage | Why |
|---|---|---|
| Outlier detection | IQR fence = Q1-1.5×IQR to Q3+1.5×IQR | Robust to skewed data |
| Feature selection | Zero/near-zero range → remove feature | No information content |
| Min-Max normalization | Scale to [0,1] using range | Neural networks need bounded inputs |
| Box plots | IQR defines the box | Visual model diagnostics |
| Anomaly detection | IQR-based thresholds | Production data monitoring |
import numpy as np
from sklearn.preprocessing import MinMaxScaler
np.random.seed(42)
# IQR-based outlier detection
data = np.concatenate([np.random.normal(50, 10, 100), [200, -50]])
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
outliers = data[(data < lower) | (data > upper)]
print(f"IQR: {iqr:.2f}, Fences: [{lower:.2f}, {upper:.2f}]")
print(f"Outliers detected: {len(outliers)} ({outliers})")
# Min-Max normalization using range
data_features = np.random.randn(100, 3) * [10, 1, 100] # very different ranges
scaler = MinMaxScaler()
normalized = scaler.fit_transform(data_features)
print(f"\nOriginal ranges: {[f'{d.max()-d.min():.1f}' for d in data_features.T]}")
print(f"Normalized ranges: {[f'{d.max()-d.min():.3f}' for d in normalized.T]}")
Key Takeaways
Summary: Range and IQR
- Range is simple but useless with outliers — one bad data point ruins it
- IQR is the most robust simple spread measure — covers the middle 50%
- The 1.5×IQR rule for outlier detection is built into most box plot implementations
- For symmetric data without outliers, standard deviation is more informative than IQR
- For skewed data or data with outliers, report IQR instead of (or alongside) standard deviation
- IQR = 0 means at least 50% of data is identical — common in count data