Z-Scores — Standardization and the Normal Table

Descriptive Statistics

Unlock the Power of Standardization

Z-scores transform any dataset into a universal language — making it possible to compare values from completely different scales, detect outliers, and unlock probabilities from the normal table.

Key things this concept helps with:

Cross-Scale Comparison — Compare a math test score to an essay rubric rating fairly
Outlier Detection — Identify values that fall unusually far from the mean
Probability Calculation — Convert any normal distribution into the standard normal for easy probability lookup
Machine Learning Pipelines — Standardize features so algorithms treat them equally

Once you master z-scores, every normal distribution becomes as familiar as the standard normal table.

What is a Z-Score?

Definition

A z-score (standard score) indicates how many standard deviations an element is from the mean. It standardizes values for cross-scale comparison.

Z-Score Formula

z = \frac{x - \mu}{\sigma}

Here,

$x$ =The raw data value
$\mu$ =The population mean
$\sigma$ =The population standard deviation
$z$ =The number of standard deviations from the mean

import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler

scores = np.array([72, 85, 91, 68, 77, 94, 83, 79, 88, 62])
mean, std = scores.mean(), scores.std(ddof=1)
z = (scores - mean) / std

print(f"Mean={mean:.2f}, Std={std:.2f}\n")
for s, zz in zip(scores, z):
    print(f"  Score={s:3d}  z={zz:+.3f}  ({abs(zz):.1f}σ {'above' if zz>0 else 'below'} mean)")

Comparing Across Different Tests

alice_math  = 85;  mu_m = 75; sd_m = 10   # Math test
alice_essay = 4.5; mu_e = 4.0; sd_e = 0.5  # Essay rubric (0-6)

z_math  = (alice_math  - mu_m) / sd_m
z_essay = (alice_essay - mu_e) / sd_e

print(f"Math z-score:  {z_math:.2f}")
print(f"Essay z-score: {z_essay:.2f}")
print(f"Better performance: {'Math' if z_math > z_essay else 'Essay'}")

Example: Comparing Scores

Alice scores 85 on a math test (mean=75, sd=10) and 4.5 on an essay rubric (mean=4.0, sd=0.5).

Math z-score: $z = \frac{85 - 75}{10} = 1.00$
Essay z-score: $z = \frac{4.5 - 4.0}{0.5} = 1.00$

Both z-scores are equal, so her relative performance is the same on both tests.

Z-Scores and Normal Probabilities

print("Key z-score probabilities:")
print(f"P(Z < 1.645) = {norm.cdf(1.645):.4f}  (90th pct)")
print(f"P(Z < 1.960) = {norm.cdf(1.960):.4f}  (97.5th pct)")
print(f"P(Z < 2.326) = {norm.cdf(2.326):.4f}  (99th pct)")
print(f"P(-1.96<Z<1.96) = {norm.cdf(1.96)-norm.cdf(-1.96):.4f}  (95%)")

# Given probability -> find z
print(f"\nZ for 90th percentile: {norm.ppf(0.90):.4f}")
print(f"Z for 97.5th:          {norm.ppf(0.975):.4f}")
print(f"Z for 99th:            {norm.ppf(0.99):.4f}")

Key Probabilities

$P(-1.96 < Z < 1.96) \approx 95\%$ — foundation of 95% confidence intervals
$|z| > 3$ flags potential outliers in approximately normal data

Standardization for Machine Learning

X = np.array([
    [170, 70, 25],
    [180, 85, 30],
    [160, 55, 22],
    [175, 80, 35],
    [165, 60, 28],
], dtype=float)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Standardized features (z-scores):")
print(X_scaled.round(3))
print(f"Column means (should be ~0): {X_scaled.mean(axis=0).round(6)}")
print(f"Column stds  (should be ~1): {X_scaled.std(axis=0).round(4)}")

Z-Scores in Machine Learning

ML Application	Z-Score Usage	Why
StandardScaler	z = (x - μ) / σ	Neural networks need normalized input
Anomaly detection	\|z\| > 3 → outlier	Production monitoring
Feature normalization	Compare features on same scale	Distance-based algorithms
Batch normalization	Normalize layer activations	Stabilizes deep learning

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import LocalOutlierFactor

np.random.seed(42)

# Z-score based anomaly detection
data = np.random.randn(100, 3)
z_scores = np.abs((data - data.mean(axis=0)) / data.std(axis=0))
anomalies = (z_scores > 3).any(axis=1)
print(f"Data points with any |z| > 3: {anomalies.sum()}")

# StandardScaler = z-score normalization
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(f"\nOriginal: mean={data.mean():.4f}, std={data.std():.4f}")
print(f"Scaled:   mean={scaled.mean():.4f}, std={scaled.std():.4f}")

Key Takeaways

The z-score formula z = (x−μ)/σ converts any value to units of standard deviations from the mean

Z-scores enable fair comparison across different scales — compare a math score to an essay rubric

P(−1.96 < Z < 1.96) ≈ 95% is the foundation of confidence intervals and hypothesis testing

|z| > 3 is a common rule of thumb for flagging potential outliers in approximately normal data

Z-scores are the bridge between raw data and the standard normal table — once you standardize, the entire normal distribution is at your fingertips.

Z-Scores — Standardization and the Normal Table

Z-Scores — Standardization and the Normal Table

Unlock the Power of Standardization

What is a Z-Score?

Definition

Z-Score Formula

Comparing Across Different Tests

Example: Comparing Scores

Z-Scores and Normal Probabilities

Standardization for Machine Learning

Z-Scores in Machine Learning

Key Takeaways

Premium Content

Need Expert Statistics Help?