🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Z-Scores — Standardization and the Normal Table

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Z-Scores — Standardization and the Normal Table

Descriptive Statistics

Unlock the Power of Standardization

Z-scores transform any dataset into a universal language — making it possible to compare values from completely different scales, detect outliers, and unlock probabilities from the normal table.

Key things this concept helps with:

  • Cross-Scale Comparison — Compare a math test score to an essay rubric rating fairly
  • Outlier Detection — Identify values that fall unusually far from the mean
  • Probability Calculation — Convert any normal distribution into the standard normal for easy probability lookup
  • Machine Learning Pipelines — Standardize features so algorithms treat them equally

Once you master z-scores, every normal distribution becomes as familiar as the standard normal table.


What is a Z-Score?

Definition

A z-score (standard score) indicates how many standard deviations an element is from the mean. It standardizes values for cross-scale comparison.


Z-Score Formula

z=xμσz = \frac{x - \mu}{\sigma}

Here,

  • xx=The raw data value
  • μ\mu=The population mean
  • σ\sigma=The population standard deviation
  • zz=The number of standard deviations from the mean
import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler

scores = np.array([72, 85, 91, 68, 77, 94, 83, 79, 88, 62])
mean, std = scores.mean(), scores.std(ddof=1)
z = (scores - mean) / std

print(f"Mean={mean:.2f}, Std={std:.2f}\n")
for s, zz in zip(scores, z):
    print(f"  Score={s:3d}  z={zz:+.3f}  ({abs(zz):.1f}σ {'above' if zz>0 else 'below'} mean)")

Comparing Across Different Tests

alice_math  = 85;  mu_m = 75; sd_m = 10   # Math test
alice_essay = 4.5; mu_e = 4.0; sd_e = 0.5  # Essay rubric (0-6)

z_math  = (alice_math  - mu_m) / sd_m
z_essay = (alice_essay - mu_e) / sd_e

print(f"Math z-score:  {z_math:.2f}")
print(f"Essay z-score: {z_essay:.2f}")
print(f"Better performance: {'Math' if z_math > z_essay else 'Essay'}")

Example: Comparing Scores

Alice scores 85 on a math test (mean=75, sd=10) and 4.5 on an essay rubric (mean=4.0, sd=0.5).

  • Math z-score: z=857510=1.00z = \frac{85 - 75}{10} = 1.00
  • Essay z-score: z=4.54.00.5=1.00z = \frac{4.5 - 4.0}{0.5} = 1.00

Both z-scores are equal, so her relative performance is the same on both tests.


Z-Scores and Normal Probabilities

print("Key z-score probabilities:")
print(f"P(Z < 1.645) = {norm.cdf(1.645):.4f}  (90th pct)")
print(f"P(Z < 1.960) = {norm.cdf(1.960):.4f}  (97.5th pct)")
print(f"P(Z < 2.326) = {norm.cdf(2.326):.4f}  (99th pct)")
print(f"P(-1.96<Z<1.96) = {norm.cdf(1.96)-norm.cdf(-1.96):.4f}  (95%)")

# Given probability -> find z
print(f"\nZ for 90th percentile: {norm.ppf(0.90):.4f}")
print(f"Z for 97.5th:          {norm.ppf(0.975):.4f}")
print(f"Z for 99th:            {norm.ppf(0.99):.4f}")

Key Probabilities

  • P(1.96<Z<1.96)95%P(-1.96 < Z < 1.96) \approx 95\% — foundation of 95% confidence intervals
  • z>3|z| > 3 flags potential outliers in approximately normal data

Standardization for Machine Learning

X = np.array([
    [170, 70, 25],
    [180, 85, 30],
    [160, 55, 22],
    [175, 80, 35],
    [165, 60, 28],
], dtype=float)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Standardized features (z-scores):")
print(X_scaled.round(3))
print(f"Column means (should be ~0): {X_scaled.mean(axis=0).round(6)}")
print(f"Column stds  (should be ~1): {X_scaled.std(axis=0).round(4)}")

Z-Scores in Machine Learning

ML ApplicationZ-Score UsageWhy
StandardScalerz = (x - μ) / σNeural networks need normalized input
Anomaly detection|z| > 3 → outlierProduction monitoring
Feature normalizationCompare features on same scaleDistance-based algorithms
Batch normalizationNormalize layer activationsStabilizes deep learning
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import LocalOutlierFactor

np.random.seed(42)

# Z-score based anomaly detection
data = np.random.randn(100, 3)
z_scores = np.abs((data - data.mean(axis=0)) / data.std(axis=0))
anomalies = (z_scores > 3).any(axis=1)
print(f"Data points with any |z| > 3: {anomalies.sum()}")

# StandardScaler = z-score normalization
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(f"\nOriginal: mean={data.mean():.4f}, std={data.std():.4f}")
print(f"Scaled:   mean={scaled.mean():.4f}, std={scaled.std():.4f}")

Key Takeaways

The z-score formula z = (x−μ)/σ converts any value to units of standard deviations from the mean

Z-scores enable fair comparison across different scales — compare a math score to an essay rubric

P(−1.96 < Z < 1.96) ≈ 95% is the foundation of confidence intervals and hypothesis testing

|z| > 3 is a common rule of thumb for flagging potential outliers in approximately normal data

Z-scores are the bridge between raw data and the standard normal table — once you standardize, the entire normal distribution is at your fingertips.

Premium Content

Z-Scores — Standardization and the Normal Table

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement