Cramér's V

Descriptive Statistics

Effect Size for Any Contingency Table — Not Just 2×2

The chi-square test tells you whether an association exists, but not how strong it is. Cramér's V normalizes the chi-square statistic into a bounded measure from 0 to 1, giving you the effect size regardless of table dimensions.

Key things Cramér's V helps you understand:

Universal categorical association — Works for tables of any size (2×2, 3×4, 5×5), unlike phi which is limited to 2×2.
Effect size interpretation — Cohen's guidelines (0.1 small, 0.3 medium, 0.5 large) provide a quick rule of thumb.
Dependence on table dimensions — The same V value means different things for a 2×2 table versus a 5×5 table; always report table size alongside V.

Chi-square asks "is there an association?" — Cramér's V answers "how strong is it?"

What is Cramér's V?

Definition

Cramér's V measures the strength of association between two categorical variables, based on the chi-square statistic.

DfCramér's V

Cramér's V is a measure of effect size for the chi-square test of independence. It normalizes the chi-square statistic to range from 0 (no association) to 1 (perfect association), and works for tables of any size.

Cramér's V Formula

V = \sqrt{\frac{\chi^2}{n \cdot \min(r-1, c-1)}}

Here,

$\chi^2$ =Chi-square statistic from the test of independence
$n$ =Total sample size
$r$ =Number of rows in the contingency table
$c$ =Number of columns in the contingency table

import numpy as np
from scipy import stats

# Create a contingency table (e.g., Hair Color vs Eye Color)
table = np.array([
    [68, 119, 26],
    [20, 84, 17],
    [15, 54, 14]
])

chi2, p_value, dof, expected = stats.chi2_contingency(table)
n = table.sum()
r, c = table.shape
v = np.sqrt(chi2 / (n * (min(r-1, c-1))))

print(f"Contingency table:\n{table}")
print(f"\nChi-square = {chi2:.4f}")
print(f"Cramér's V = {v:.4f}")
print(f"p-value    = {p_value:.2e}")

Interpretation Guidelines (Cohen's Rules)

V Value	Interpretation
0.00 – 0.10	Negligible association
0.10 – 0.30	Weak association
0.30 – 0.50	Moderate association
0.50+	Strong association

Effect Size Dependence

The interpretation of V depends on the table size. For a 2×2 table, the maximum V is 1.0 (equivalent to phi). For larger tables, V is always less than 1.0 even with perfect association.

Python Implementation

def cramers_v(table):
    """Calculate Cramér's V for any contingency table."""
    chi2, _, _, _ = stats.chi2_contingency(table)
    n = table.sum()
    r, c = table.shape
    return np.sqrt(chi2 / (n * (min(r-1, c-1))))

# Test with different tables
table_2x2 = np.array([[50, 30], [20, 40]])
table_3x3 = np.array([[30, 10, 5], [15, 25, 10], [5, 10, 30]])

print(f"2x2 table V = {cramers_v(table_2x2):.4f}")
print(f"3x3 table V = {cramers_v(table_3x3):.4f}")

Cramér's V vs Phi Coefficient

Feature	Phi (φ)	Cramér's V
Table size	2×2 only	Any r×c
Maximum value	1.0	1.0 (for 2×2), <1.0 for larger
Formula	φ = √(χ²/n)	V = √(χ²/(n·min(r-1,c-1)))
Use case	Binary vs binary	Any categorical vs categorical

Cramer's V in Machine Learning

ML Application	Cramer's V Usage	Why
Feature selection	Categorical vs categorical	Know which features relate
NLP	Word co-occurrence strength	Feature engineering
Data validation	Check data relationships	Sanity checks

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Create contingency table
data = pd.DataFrame({
    'weather': np.random.choice(['sunny', 'rainy', 'cloudy'], 200),
    'activity': np.random.choice(['indoor', 'outdoor'], 200)
})
contingency = pd.crosstab(data['weather'], data['activity'])

chi2, p, dof, expected = chi2_contingency(contingency)
n = contingency.sum().sum()
min_dim = min(contingency.shape) - 1
v = np.sqrt(chi2 / (n * min_dim))
print(f"Chi-square: {chi2:.3f}, p-value: {p:.4f}")
print(f"Cramer's V: {v:.4f} (0=no association, 1=perfect)")

Key Takeaways

Cramér's V measures association between any two categorical variables — generalizes phi to tables of any size.

V ranges from 0 to 1 — 0 means no association, 1 means perfect association.

Based on chi-square — V = √(χ²/(n·min(r-1, c-1))), linking the test of independence directly to effect size.

Interpretation depends on table size — always report table dimensions alongside V, and use Cohen's guidelines as a starting point.

"The chi-square test tells you something is going on; Cramér's V tells you how much — and the table dimensions tell you how to interpret it."

Summary: Cramér's V

Cramér's V measures association between any two categorical variables — generalizes phi to larger tables
V ranges from 0 to 1 — 0 means no association, 1 means perfect association
Based on chi-square — V = √(χ²/(n·min(r-1, c-1)))
Interpretation depends on table size — same V value means different things for different table dimensions
Use Cohen's guidelines: 0.1=small, 0.3=medium, 0.5=large
Always report chi-square, df, and p-value alongside V for complete reporting

Cramér's V — Effect Size for Chi-Square Tests