🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Coefficient of Variation — Relative Dispersion Across Different Scales

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Coefficient of Variation

Descriptive Statistics

Comparing Variability Across Any Scale

The coefficient of variation strips away units and magnitude, revealing the true relative spread in any dataset.

Understanding CV helps you:

  • Compare risk — evaluate investment portfolios with different price levels
  • Benchmark precision — assess manufacturing quality across different target sizes
  • Normalize across scales — compare variability when means are drastically different
  • Detect measurement problems — flag datasets where spread is disproportionate to the mean

When means differ wildly, standard deviation alone lies. CV tells the real story.


What is Coefficient of Variation?

Definition

The Coefficient of Variation (CV) is a dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"

Coefficient of Variation

CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Here,

  • ss=Standard deviation
  • xˉ\bar{x}=Mean

A dimensionless measure of relative dispersion. It answers: "How large is the standard deviation relative to the mean?"

Investment Risk Comparison

import numpy as np
import pandas as pd

np.random.seed(42)

portfolios = {
    'Tech ETF':    np.random.normal(150, 35, 252),   # high CV
    'Bond Fund':   np.random.normal(105,  6, 252),   # low CV
    'Gold':        np.random.normal(180, 28, 252),   # moderate
}

print(f"{'Asset':<15} {'Mean':>8} {'Std Dev':>10} {'CV%':>8} {'Risk Level':>12}")
print("-"*55)
for name, prices in portfolios.items():
    daily_ret = np.diff(prices)/prices[:-1]*100
    cv = np.std(daily_ret,ddof=1)/np.mean(prices)*100
    mu = np.mean(prices); sd = np.std(daily_ret, ddof=1)
    risk = "High" if cv>15 else ("Moderate" if cv>8 else "Low")
    print(f"{name:<15} {mu:>8.2f} {sd:>10.4f} {cv:>7.2f}% {risk:>12}")

Quality Control: Machine Precision

# Two machines producing 50mm bolts
np.random.seed(1)
machine_a = np.random.normal(50.0, 0.3, 500)   # precise
machine_b = np.random.normal(50.0, 0.9, 500)   # less precise

for name, data in [('Machine A', machine_a), ('Machine B', machine_b)]:
    cv = np.std(data,ddof=1)/np.mean(data)*100
    out_of_spec = np.sum(np.abs(data-50)>1.0)/len(data)*100
    print(f"{name}: CV={cv:.4f}%, Out-of-spec: {out_of_spec:.2f}%")

Limitations

LimitationWhen It Occurs
UndefinedMean ≈ 0 (division by zero)
MeaninglessInterval scale data (temperature in °C, IQ)
MisleadingBimodal or highly skewed distributions
SensitiveOutliers affect both numerator and denominator
# CV breaks near zero mean
near_zero = np.array([-2, -1, 0, 1, 2])
print(f"Mean={near_zero.mean()}, CV={np.std(near_zero,ddof=1)/near_zero.mean()*100:.2f}% <- nonsense")

Coefficient of Variation in Machine Learning

ML ApplicationCV UsageWhy
Feature comparisonCompare std across features with different unitsFair comparison
Model stabilityCV of predictions across foldsLower CV = more stable model
Risk-adjusted returnsReturn per unit of riskFinancial ML
Hyperparameter tuningCV of metric across searchIdentify robust parameters
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=5, random_state=42)

# Compare model stability using CV of accuracy
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

models = {
    'Logistic': LogisticRegression(),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM': SVC()
}

for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5)
    cv_score = scores.std() / scores.mean()  # coefficient of variation
    print(f"{name:15s}: mean={scores.mean():.3f}, std={scores.std():.3f}, CV={cv_score:.4f}")
print("Lower CV = more stable model across folds")

Key Takeaways

CV = SD/mean × 100% is dimensionless — compare across any units

Requires ratio scale data with a meaningful non-zero positive mean

In finance, CV is the volatility-to-return ratio — the higher the risk per unit of return

Never use CV for data that can be zero or negative — the ratio becomes meaningless

"CV is the great equalizer — it lets you compare apples to oranges when it comes to variability."

Premium Content

Coefficient of Variation — Relative Dispersion Across Different Scales

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement