Coefficient of Variation

Descriptive Statistics

Comparing Variability Across Any Scale

The coefficient of variation strips away units and magnitude, revealing the true relative spread in any dataset.

Understanding CV helps you:

Compare risk — evaluate investment portfolios with different price levels
Benchmark precision — assess manufacturing quality across different target sizes
Normalize across scales — compare variability when means are drastically different
Detect measurement problems — flag datasets where spread is disproportionate to the mean

When means differ wildly, standard deviation alone lies. CV tells the real story.

What is Coefficient of Variation?

Definition

The Coefficient of Variation (CV) is a dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"

Coefficient of Variation

CV = \frac{s}{\bar{x}} \times 100\%

Here,

$s$ =Standard deviation
$\bar{x}$ =Mean

A dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"

Investment Risk Comparison

import numpy as np
import pandas as pd

np.random.seed(42)

portfolios = {
    'Tech ETF':    np.random.normal(150, 35, 252),   # high CV
    'Bond Fund':   np.random.normal(105,  6, 252),   # low CV
    'Gold':        np.random.normal(180, 28, 252),   # moderate
}

print(f"{'Asset':<15} {'Mean':>8} {'Std Dev':>10} {'CV%':>8} {'Risk Level':>12}")
print("-"*55)
for name, prices in portfolios.items():
    daily_ret = np.diff(prices)/prices[:-1]*100
    cv = np.std(daily_ret,ddof=1)/np.mean(prices)*100
    mu = np.mean(prices); sd = np.std(daily_ret, ddof=1)
    risk = "High" if cv>15 else ("Moderate" if cv>8 else "Low")
    print(f"{name:<15} {mu:>8.2f} {sd:>10.4f} {cv:>7.2f}% {risk:>12}")

Quality Control: Machine Precision

# Two machines producing 50mm bolts
np.random.seed(1)
machine_a = np.random.normal(50.0, 0.3, 500)   # precise
machine_b = np.random.normal(50.0, 0.9, 500)   # less precise

for name, data in [('Machine A', machine_a), ('Machine B', machine_b)]:
    cv = np.std(data,ddof=1)/np.mean(data)*100
    out_of_spec = np.sum(np.abs(data-50)>1.0)/len(data)*100
    print(f"{name}: CV={cv:.4f}%, Out-of-spec: {out_of_spec:.2f}%")

Limitations

Limitation	When It Occurs
Undefined	Mean ≈ 0 (division by zero)
Meaningless	Interval scale data (temperature in °C, IQ)
Misleading	Bimodal or highly skewed distributions
Sensitive	Outliers affect both numerator and denominator

# CV breaks near zero mean
near_zero = np.array([-2, -1, 0, 1, 2])
print(f"Mean={near_zero.mean()}, CV={np.std(near_zero,ddof=1)/near_zero.mean()*100:.2f}% <- nonsense")