Cramér's V
Descriptive Statistics
Effect Size for Any Contingency Table — Not Just 2×2
The chi-square test tells you whether an association exists, but not how strong it is. Cramér's V normalizes the chi-square statistic into a bounded measure from 0 to 1, giving you the effect size regardless of table dimensions.
Key things Cramér's V helps you understand:
- Universal categorical association — Works for tables of any size (2×2, 3×4, 5×5), unlike phi which is limited to 2×2.
- Effect size interpretation — Cohen's guidelines (0.1 small, 0.3 medium, 0.5 large) provide a quick rule of thumb.
- Dependence on table dimensions — The same V value means different things for a 2×2 table versus a 5×5 table; always report table size alongside V.
Chi-square asks "is there an association?" — Cramér's V answers "how strong is it?"
What is Cramér's V?
Definition
Cramér's V measures the strength of association between two categorical variables, based on the chi-square statistic.
DfCramér's V
Cramér's V is a measure of effect size for the chi-square test of independence. It normalizes the chi-square statistic to range from 0 (no association) to 1 (perfect association), and works for tables of any size.
Cramér's V Formula
Here,
- =Chi-square statistic from the test of independence
- =Total sample size
- =Number of rows in the contingency table
- =Number of columns in the contingency table
import numpy as np
from scipy import stats
# Create a contingency table (e.g., Hair Color vs Eye Color)
table = np.array([
[68, 119, 26],
[20, 84, 17],
[15, 54, 14]
])
chi2, p_value, dof, expected = stats.chi2_contingency(table)
n = table.sum()
r, c = table.shape
v = np.sqrt(chi2 / (n * (min(r-1, c-1))))
print(f"Contingency table:\n{table}")
print(f"\nChi-square = {chi2:.4f}")
print(f"Cramér's V = {v:.4f}")
print(f"p-value = {p_value:.2e}")
Interpretation Guidelines (Cohen's Rules)
| V Value | Interpretation |
|---|---|
| 0.00 – 0.10 | Negligible association |
| 0.10 – 0.30 | Weak association |
| 0.30 – 0.50 | Moderate association |
| 0.50+ | Strong association |
Effect Size Dependence
The interpretation of V depends on the table size. For a 2×2 table, the maximum V is 1.0 (equivalent to phi). For larger tables, V is always less than 1.0 even with perfect association.
Python Implementation
def cramers_v(table):
"""Calculate Cramér's V for any contingency table."""
chi2, _, _, _ = stats.chi2_contingency(table)
n = table.sum()
r, c = table.shape
return np.sqrt(chi2 / (n * (min(r-1, c-1))))
# Test with different tables
table_2x2 = np.array([[50, 30], [20, 40]])
table_3x3 = np.array([[30, 10, 5], [15, 25, 10], [5, 10, 30]])
print(f"2x2 table V = {cramers_v(table_2x2):.4f}")
print(f"3x3 table V = {cramers_v(table_3x3):.4f}")
Cramér's V vs Phi Coefficient
| Feature | Phi (φ) | Cramér's V |
|---|---|---|
| Table size | 2×2 only | Any r×c |
| Maximum value | 1.0 | 1.0 (for 2×2), <1.0 for larger |
| Formula | φ = √(χ²/n) | V = √(χ²/(n·min(r-1,c-1))) |
| Use case | Binary vs binary | Any categorical vs categorical |
Cramer's V in Machine Learning
| ML Application | Cramer's V Usage | Why |
|---|---|---|
| Feature selection | Categorical vs categorical | Know which features relate |
| NLP | Word co-occurrence strength | Feature engineering |
| Data validation | Check data relationships | Sanity checks |
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
# Create contingency table
data = pd.DataFrame({
'weather': np.random.choice(['sunny', 'rainy', 'cloudy'], 200),
'activity': np.random.choice(['indoor', 'outdoor'], 200)
})
contingency = pd.crosstab(data['weather'], data['activity'])
chi2, p, dof, expected = chi2_contingency(contingency)
n = contingency.sum().sum()
min_dim = min(contingency.shape) - 1
v = np.sqrt(chi2 / (n * min_dim))
print(f"Chi-square: {chi2:.3f}, p-value: {p:.4f}")
print(f"Cramer's V: {v:.4f} (0=no association, 1=perfect)")
Key Takeaways
Cramér's V measures association between any two categorical variables — generalizes phi to tables of any size.
V ranges from 0 to 1 — 0 means no association, 1 means perfect association.
Based on chi-square — V = √(χ²/(n·min(r-1, c-1))), linking the test of independence directly to effect size.
Interpretation depends on table size — always report table dimensions alongside V, and use Cohen's guidelines as a starting point.
"The chi-square test tells you something is going on; Cramér's V tells you how much — and the table dimensions tell you how to interpret it."
Summary: Cramér's V
- Cramér's V measures association between any two categorical variables — generalizes phi to larger tables
- V ranges from 0 to 1 — 0 means no association, 1 means perfect association
- Based on chi-square — V = √(χ²/(n·min(r-1, c-1)))
- Interpretation depends on table size — same V value means different things for different table dimensions
- Use Cohen's guidelines: 0.1=small, 0.3=medium, 0.5=large
- Always report chi-square, df, and p-value alongside V for complete reporting