Coefficient of Variation
Descriptive Statistics
Comparing Variability Across Any Scale
The coefficient of variation strips away units and magnitude, revealing the true relative spread in any dataset.
Understanding CV helps you:
- Compare risk — evaluate investment portfolios with different price levels
- Benchmark precision — assess manufacturing quality across different target sizes
- Normalize across scales — compare variability when means are drastically different
- Detect measurement problems — flag datasets where spread is disproportionate to the mean
When means differ wildly, standard deviation alone lies. CV tells the real story.
What is Coefficient of Variation?
Definition
The Coefficient of Variation (CV) is a dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"
Coefficient of Variation
Here,
- =Standard deviation
- =Mean
A dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"
Investment Risk Comparison
import numpy as np
import pandas as pd
np.random.seed(42)
portfolios = {
'Tech ETF': np.random.normal(150, 35, 252), # high CV
'Bond Fund': np.random.normal(105, 6, 252), # low CV
'Gold': np.random.normal(180, 28, 252), # moderate
}
print(f"{'Asset':<15} {'Mean':>8} {'Std Dev':>10} {'CV%':>8} {'Risk Level':>12}")
print("-"*55)
for name, prices in portfolios.items():
daily_ret = np.diff(prices)/prices[:-1]*100
cv = np.std(daily_ret,ddof=1)/np.mean(prices)*100
mu = np.mean(prices); sd = np.std(daily_ret, ddof=1)
risk = "High" if cv>15 else ("Moderate" if cv>8 else "Low")
print(f"{name:<15} {mu:>8.2f} {sd:>10.4f} {cv:>7.2f}% {risk:>12}")
Quality Control: Machine Precision
# Two machines producing 50mm bolts
np.random.seed(1)
machine_a = np.random.normal(50.0, 0.3, 500) # precise
machine_b = np.random.normal(50.0, 0.9, 500) # less precise
for name, data in [('Machine A', machine_a), ('Machine B', machine_b)]:
cv = np.std(data,ddof=1)/np.mean(data)*100
out_of_spec = np.sum(np.abs(data-50)>1.0)/len(data)*100
print(f"{name}: CV={cv:.4f}%, Out-of-spec: {out_of_spec:.2f}%")
Limitations
| Limitation | When It Occurs |
|---|---|
| Undefined | Mean ≈ 0 (division by zero) |
| Meaningless | Interval scale data (temperature in °C, IQ) |
| Misleading | Bimodal or highly skewed distributions |
| Sensitive | Outliers affect both numerator and denominator |
# CV breaks near zero mean
near_zero = np.array([-2, -1, 0, 1, 2])
print(f"Mean={near_zero.mean()}, CV={np.std(near_zero,ddof=1)/near_zero.mean()*100:.2f}% <- nonsense")
Coefficient of Variation in Machine Learning
| ML Application | CV Usage | Why |
|---|---|---|
| Feature comparison | Compare std across features with different units | Fair comparison |
| Model stability | CV of predictions across folds | Lower CV = more stable model |
| Risk-adjusted returns | Return per unit of risk | Financial ML |
| Hyperparameter tuning | CV of metric across search | Identify robust parameters |
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=500, n_features=5, random_state=42)
# Compare model stability using CV of accuracy
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
models = {
'Logistic': LogisticRegression(),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'SVM': SVC()
}
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=5)
cv_score = scores.std() / scores.mean() # coefficient of variation
print(f"{name:15s}: mean={scores.mean():.3f}, std={scores.std():.3f}, CV={cv_score:.4f}")
print("Lower CV = more stable model across folds")
Key Takeaways
CV = SD/mean × 100% is dimensionless — compare across any units
Requires ratio scale data with a meaningful non-zero positive mean
In finance, CV is the volatility-to-return ratio — the higher the risk per unit of return
Never use CV for data that can be zero or negative — the ratio becomes meaningless
"CV is the great equalizer — it lets you compare apples to oranges when it comes to variability."