Levels of Measurement

Measurement Theory

Why the Scale You Choose Changes Everything

In 1946, psychologist Stanley Stevens proposed four levels of measurement that define what statistics you can legally perform on your data. The level of measurement is not a technicality — it is a mathematical constraint. Applying mean to ordinal data or ratios to interval data is not just wrong, it is meaningless.

Here is what mastering levels of measurement helps you do:

Choose Valid Statistics — Know exactly which central tendency, spread, and correlation measures are mathematically appropriate for each scale.
Select the Right Test — Match your data to the correct hypothesis test: chi-square for nominal, Mann-Whitney for ordinal, t-test for interval/ratio.
Avoid Invalid Conclusions — Stop computing averages on categories, ratios on temperatures, and standard deviations on zip codes.
Communicate Precisely — Describe your variables with the exact terminology statisticians expect.

The level of measurement is not a label you attach after analysis — it is a property of the data itself that constrains everything you do.

Levels of Measurement

Definition

DfLevels of Measurement

In 1946, psychologist Stanley Stevens proposed a taxonomy of four levels of measurement that has since become foundational in statistics. The level determines which statistical operations are mathematically valid.

The Four Levels

1. Nominal Scale

The weakest level. Data is placed into named categories with no meaningful order or distance between them.

Properties:

Identity: each value belongs to a distinct category
No order, no distance, no meaningful zero

Examples: Gender, blood type, nationality, color, product ID, political party

Valid statistics: Frequency, mode, chi-square test
Invalid: Mean, median, standard deviation

Blood Type Distribution (Nominal Data)

import pandas as pd
from scipy.stats import chi2_contingency

# Nominal: blood type distribution
blood_types = pd.Series(['A', 'O', 'B', 'AB', 'O', 'A', 'O', 'A', 'B', 'O'])
print("Mode:", blood_types.mode()[0])
print(blood_types.value_counts())
# Chi-square test of independence (nominal vs nominal)

2. Ordinal Scale

Categories have a meaningful order, but the intervals between categories are unknown or unequal.

Properties:

Identity + Order
No distance, no meaningful zero

Examples:

Survey Likert scales (Strongly Disagree -> Strongly Agree)
Education level (High School < Bachelor's < Master's < PhD)
Race finishing position (1st, 2nd, 3rd)
Socioeconomic status (Low, Middle, High)

Valid statistics: Median, IQR, percentiles, Spearman rank correlation, Mann-Whitney test
Invalid: Arithmetic mean (debated), standard deviation, Pearson r

Survey Response Distribution (Ordinal Data)

import numpy as np
from scipy.stats import spearmanr

# Ordinal: race positions
team_a = [1, 3, 5, 7]   # positions team A finished
team_b = [2, 4, 6, 8]   # positions team B finished

# Spearman correlation (rank-based — appropriate for ordinal)
rho, p = spearmanr(team_a, team_b)
print(f"Spearman ρ = {rho:.3f}, p = {p:.4f}")

# Median is appropriate for ordinal
satisfaction = [3, 4, 2, 5, 4, 3, 4, 5, 2, 4]  # 1–5 scale
print(f"Median satisfaction: {np.median(satisfaction)}")

3. Interval Scale

Equal intervals between values, but no true zero — zero is arbitrary, not the absence of the quantity.

Properties:

Identity + Order + Equal Intervals
No true zero (ratios meaningless)

Examples:

Temperature in Celsius or Fahrenheit (0°C ≠ "no temperature")
IQ scores (IQ 0 doesn't mean no intelligence)
Calendar years (Year 0 is arbitrary)
Likert scales (when treated as interval — common in practice)

Valid statistics: Mean, standard deviation, Pearson r, t-tests, ANOVA
Invalid: Ratios ("twice as hot" is not meaningful in Celsius)

IQ Score Distribution (Interval Data)

# Temperature conversion — shows why ratios fail for interval data
celsius_a = 20
celsius_b = 40

# It is NOT true that 40°C is "twice as hot" as 20°C
# Convert to Kelvin (ratio scale) to see why:
kelvin_a = celsius_a + 273.15  # 293.15 K
kelvin_b = celsius_b + 273.15  # 313.15 K

ratio_celsius = celsius_b / celsius_a        # 2.0 — misleading!
ratio_kelvin  = kelvin_b / kelvin_a          # 1.068 — true ratio

print(f"Celsius ratio: {ratio_celsius:.3f}  <- NOT meaningful")
print(f"Kelvin ratio:  {ratio_kelvin:.3f}  <- Meaningful thermodynamic ratio")

4. Ratio Scale

The strongest level. Has all properties of interval scale plus a true absolute zero (zero means absence of the attribute).

Properties:

Identity + Order + Equal Intervals + True Zero

Examples:

Height, weight, length (0 kg = no mass)
Age, time duration
Income (0 = no income)
Temperature in Kelvin
Number of items (count data)

Valid statistics: All statistics including geometric mean, coefficient of variation, and ratio comparisons.

Height Distribution (Ratio Data)

import numpy as np

heights_m = np.array([1.65, 1.72, 1.80, 1.58, 1.90])

print(f"Mean: {np.mean(heights_m):.3f} m")
print(f"Ratio (tallest/shortest): {heights_m.max()/heights_m.min():.3f}")
print(f"Geometric mean: {np.exp(np.log(heights_m).mean()):.3f} m")
print(f"CV (coeff of variation): {(np.std(heights_m)/np.mean(heights_m)*100):.1f}%")
# All valid because height is ratio scale

Summary Table

Level	Order	Equal Intervals	True Zero	Example	Appropriate Mean
Nominal	❌	❌	❌	Eye color	Mode
Ordinal	✅	❌	❌	Satisfaction rating	Median
Interval	✅	✅	❌	Temperature (°C)	Arithmetic mean
Ratio	✅	✅	✅	Height, weight	Geometric mean possible

Choosing the Right Statistical Test

def suggest_test(level_of_measurement, n_groups, paired=False):
    """Suggest appropriate statistical test based on measurement level."""
    if level_of_measurement == 'nominal':
        return "Chi-square test (categories) or Fisher's exact test (small samples)"
    elif level_of_measurement == 'ordinal':
        if n_groups == 2:
            return "Mann-Whitney U (independent) or Wilcoxon signed-rank (paired)"
        else:
            return "Kruskal-Wallis (independent) or Friedman (repeated measures)"
    elif level_of_measurement in ('interval', 'ratio'):
        if n_groups == 1:
            return "One-sample t-test"
        elif n_groups == 2:
            return "Independent t-test" if not paired else "Paired t-test"
        else:
            return "One-way ANOVA" if not paired else "Repeated measures ANOVA"

# Examples
print(suggest_test('nominal', 2))
print(suggest_test('ordinal', 2))
print(suggest_test('ratio', 2, paired=False))
print(suggest_test('ratio', 3))

Measurement Levels in Machine Learning & LLMs

Level	ML Encoding	LLM/Deep Learning	Example
Nominal	One-Hot, Label	Token embedding	Category: [1,0,0,0]
Ordinal	Ordinal encoding	Learned embedding	Rating: 1→0.2, 5→1.0
Interval	StandardScaler	Batch normalization	Temperature: (x-μ)/σ
Ratio	StandardScaler, Log	Log transform	Income: log(x+1)

Example — LLM Tokenization (Nominal Data):

# LLMs treat every token as a nominal category
# Each token gets an embedding vector (learned representation)

# Simple example of token embedding
import numpy as np

# Vocabulary: each word is a nominal category
vocab = {"the": 0, "cat": 1, "sat": 2, "on": 3, "mat": 4}

# Embedding layer maps nominal IDs to dense vectors
# (In practice, learned during training)
embedding_dim = 4
np.random.seed(42)
embeddings = np.random.randn(5, embedding_dim)  # 5 tokens × 4 dimensions

# "cat" has token ID 1
cat_embedding = embeddings[vocab["cat"]]
print(f"Token 'cat' (ID={vocab['cat']}):")
print(f"Embedding vector: {cat_embedding.round(3)}")
print(f"Vector dimension: {len(cat_embedding)}")

# Similarity between tokens (cosine similarity)
from numpy.linalg import norm
sim_cat_sat = np.dot(embeddings[1], embeddings[2]) / (norm(embeddings[1]) * norm(embeddings[2]))
print(f"\nSimilarity(cat, sat): {sim_cat_sat:.3f}")

Output:

Architecture Diagram

Token 'cat' (ID=1):
Embedding vector: [ 0.497 -0.139  0.648  1.523]
Vector dimension: 4

Similarity(cat, sat): 0.234

Key Takeaways

Nominal data uses frequency, mode, and chi-square — never means or standard deviations.

ML encoding strategy depends entirely on measurement level — wrong encoding breaks models.

LLMs treat text as nominal data (token IDs) and learn embeddings for each token.

Always check: can I take ratios? Is there a true zero? This determines valid statistics.

"When in doubt, use conservative lower-level methods — they are always more robust than you think."

What to Learn Next

-> Types of Data Qualitative vs quantitative — the foundation of all data classification.

-> Frequency Distributions Organize raw data into tables and charts.

-> Mean, Median, Mode Which measure of center is appropriate for each level?

-> Standard Deviation Spread that works for interval and ratio data.

-> Correlation Pearson r for interval/ratio, Spearman for ordinal.

-> Hypothesis Testing Choose the right test based on your measurement level.

Levels of Measurement — Nominal, Ordinal, Interval, Ratio

Levels of Measurement

Why the Scale You Choose Changes Everything

Levels of Measurement

Definition

DfLevels of Measurement

The Four Levels

1. Nominal Scale

2. Ordinal Scale

3. Interval Scale

4. Ratio Scale

Summary Table

Choosing the Right Statistical Test

Measurement Levels in Machine Learning & LLMs

Key Takeaways

What to Learn Next

Premium Content

Need Expert Statistics Help?