Z-Scores — Standardization and the Normal Table
Descriptive Statistics
Unlock the Power of Standardization
Z-scores transform any dataset into a universal language — making it possible to compare values from completely different scales, detect outliers, and unlock probabilities from the normal table.
Key things this concept helps with:
- Cross-Scale Comparison — Compare a math test score to an essay rubric rating fairly
- Outlier Detection — Identify values that fall unusually far from the mean
- Probability Calculation — Convert any normal distribution into the standard normal for easy probability lookup
- Machine Learning Pipelines — Standardize features so algorithms treat them equally
Once you master z-scores, every normal distribution becomes as familiar as the standard normal table.
What is a Z-Score?
Definition
A z-score (standard score) indicates how many standard deviations an element is from the mean. It standardizes values for cross-scale comparison.
Z-Score Formula
Here,
- =The raw data value
- =The population mean
- =The population standard deviation
- =The number of standard deviations from the mean
import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
scores = np.array([72, 85, 91, 68, 77, 94, 83, 79, 88, 62])
mean, std = scores.mean(), scores.std(ddof=1)
z = (scores - mean) / std
print(f"Mean={mean:.2f}, Std={std:.2f}\n")
for s, zz in zip(scores, z):
print(f" Score={s:3d} z={zz:+.3f} ({abs(zz):.1f}σ {'above' if zz>0 else 'below'} mean)")
Comparing Across Different Tests
alice_math = 85; mu_m = 75; sd_m = 10 # Math test
alice_essay = 4.5; mu_e = 4.0; sd_e = 0.5 # Essay rubric (0-6)
z_math = (alice_math - mu_m) / sd_m
z_essay = (alice_essay - mu_e) / sd_e
print(f"Math z-score: {z_math:.2f}")
print(f"Essay z-score: {z_essay:.2f}")
print(f"Better performance: {'Math' if z_math > z_essay else 'Essay'}")
Example: Comparing Scores
Alice scores 85 on a math test (mean=75, sd=10) and 4.5 on an essay rubric (mean=4.0, sd=0.5).
- Math z-score:
- Essay z-score:
Both z-scores are equal, so her relative performance is the same on both tests.
Z-Scores and Normal Probabilities
print("Key z-score probabilities:")
print(f"P(Z < 1.645) = {norm.cdf(1.645):.4f} (90th pct)")
print(f"P(Z < 1.960) = {norm.cdf(1.960):.4f} (97.5th pct)")
print(f"P(Z < 2.326) = {norm.cdf(2.326):.4f} (99th pct)")
print(f"P(-1.96<Z<1.96) = {norm.cdf(1.96)-norm.cdf(-1.96):.4f} (95%)")
# Given probability -> find z
print(f"\nZ for 90th percentile: {norm.ppf(0.90):.4f}")
print(f"Z for 97.5th: {norm.ppf(0.975):.4f}")
print(f"Z for 99th: {norm.ppf(0.99):.4f}")
Key Probabilities
- — foundation of 95% confidence intervals
- flags potential outliers in approximately normal data
Standardization for Machine Learning
X = np.array([
[170, 70, 25],
[180, 85, 30],
[160, 55, 22],
[175, 80, 35],
[165, 60, 28],
], dtype=float)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Standardized features (z-scores):")
print(X_scaled.round(3))
print(f"Column means (should be ~0): {X_scaled.mean(axis=0).round(6)}")
print(f"Column stds (should be ~1): {X_scaled.std(axis=0).round(4)}")
Z-Scores in Machine Learning
| ML Application | Z-Score Usage | Why |
|---|---|---|
| StandardScaler | z = (x - μ) / σ | Neural networks need normalized input |
| Anomaly detection | |z| > 3 → outlier | Production monitoring |
| Feature normalization | Compare features on same scale | Distance-based algorithms |
| Batch normalization | Normalize layer activations | Stabilizes deep learning |
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import LocalOutlierFactor
np.random.seed(42)
# Z-score based anomaly detection
data = np.random.randn(100, 3)
z_scores = np.abs((data - data.mean(axis=0)) / data.std(axis=0))
anomalies = (z_scores > 3).any(axis=1)
print(f"Data points with any |z| > 3: {anomalies.sum()}")
# StandardScaler = z-score normalization
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(f"\nOriginal: mean={data.mean():.4f}, std={data.std():.4f}")
print(f"Scaled: mean={scaled.mean():.4f}, std={scaled.std():.4f}")
Key Takeaways
The z-score formula z = (x−μ)/σ converts any value to units of standard deviations from the mean
Z-scores enable fair comparison across different scales — compare a math score to an essay rubric
P(−1.96 < Z < 1.96) ≈ 95% is the foundation of confidence intervals and hypothesis testing
|z| > 3 is a common rule of thumb for flagging potential outliers in approximately normal data
Z-scores are the bridge between raw data and the standard normal table — once you standardize, the entire normal distribution is at your fingertips.