Contingency Tables

Descriptive Statistics

Test Whether Your Categorical Variables Are Truly Independent

Contingency tables let you move beyond descriptive counts to formally test whether two categorical variables are associated — and how strongly.

Construct tables — Build frequency matrices that cross-tabulate categorical variables
Calculate expected frequencies — Determine what counts would look like under independence
Apply chi-square testing — Quantify whether observed deviations from independence are significant
Measure association strength — Use Cramér's V and Phi to assess how related your variables are

The chi-square test transforms a table of numbers into a verdict about independence.

What are Contingency Tables?

Definition

A contingency table displays the frequency distribution of two or more categorical variables to analyze their relationship.

DfContingency Table

A contingency table (also called a cross-tabulation or crosstab) is a matrix-format table that displays the multivariate frequency distribution of variables. It helps analyze the relationship between two categorical variables.

Expected Frequency

E_{ij} = \frac{R_i \cdot C_j}{n}

Here,

$R_i$ =Row i total
$C_j$ =Column j total
$n$ =Grand total (all observations)
$E_{ij}$ =Expected frequency for cell (i,j)

import numpy as np
import pandas as pd
from scipy import stats

# Build a contingency table
data = pd.DataFrame({
    'Treatment': ['Drug']*50 + ['Placebo']*50,
    'Outcome': ['Improved']*35 + ['Not Improved']*15 + ['Improved']*20 + ['Not Improved']*30
})

ct = pd.crosstab(data['Treatment'], data['Outcome'])
print("Contingency Table:")
print(ct)

Chi-Square Test

chi2, p_value, dof, expected = stats.chi2_contingency(ct)

print(f"\nChi-square statistic = {chi2:.4f}")
print(f"p-value              = {p_value:.4f}")
print(f"Degrees of freedom   = {dof}")
print(f"\nExpected frequencies:")
print(pd.DataFrame(expected, index=ct.index, columns=ct.columns).round(2))

Component	Description
χ² statistic	Measures discrepancy between observed and expected frequencies
df	(rows - 1) × (columns - 1)
p-value	Probability of observing χ² this large if variables are independent

Fisher's Exact Test

For small sample sizes (expected frequencies < 5), use Fisher's exact test:

# Small sample example
small_table = np.array([[5, 2], [1, 4]])

odds_ratio, p_fisher = stats.fisher_exact(small_table)
print(f"Small table:\n{small_table}")
print(f"Odds ratio = {odds_ratio:.4f}")
print(f"Fisher p   = {p_fisher:.4f}")

When to Use Fisher's Exact Test

Use Fisher's exact test instead of chi-square when: (1) any expected frequency is less than 5, (2) the total sample size is small (n < 20), or (3) the table is 2×2.

Measures of Association

# Cramér's V for any table size
n = ct.sum().sum()
r, c = ct.shape
v = np.sqrt(chi2 / (n * min(r-1, c-1)))
print(f"Cramér's V = {v:.4f}")

# Phi for 2x2 tables
phi = np.sqrt(chi2 / n)
print(f"Phi (φ)    = {phi:.4f}")