🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Contingency Tables — Construction, Analysis, and Chi-Square

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Contingency Tables

Descriptive Statistics

Test Whether Your Categorical Variables Are Truly Independent

Contingency tables let you move beyond descriptive counts to formally test whether two categorical variables are associated — and how strongly.

  • Construct tables — Build frequency matrices that cross-tabulate categorical variables
  • Calculate expected frequencies — Determine what counts would look like under independence
  • Apply chi-square testing — Quantify whether observed deviations from independence are significant
  • Measure association strength — Use Cramér's V and Phi to assess how related your variables are

The chi-square test transforms a table of numbers into a verdict about independence.


What are Contingency Tables?

Definition

A contingency table displays the frequency distribution of two or more categorical variables to analyze their relationship.

DfContingency Table

A contingency table (also called a cross-tabulation or crosstab) is a matrix-format table that displays the multivariate frequency distribution of variables. It helps analyze the relationship between two categorical variables.

Expected Frequency

Eij=RiCjnE_{ij} = \frac{R_i \cdot C_j}{n}

Here,

  • RiR_i=Row i total
  • CjC_j=Column j total
  • nn=Grand total (all observations)
  • EijE_{ij}=Expected frequency for cell (i,j)
import numpy as np
import pandas as pd
from scipy import stats

# Build a contingency table
data = pd.DataFrame({
    'Treatment': ['Drug']*50 + ['Placebo']*50,
    'Outcome': ['Improved']*35 + ['Not Improved']*15 + ['Improved']*20 + ['Not Improved']*30
})

ct = pd.crosstab(data['Treatment'], data['Outcome'])
print("Contingency Table:")
print(ct)

Chi-Square Test

chi2, p_value, dof, expected = stats.chi2_contingency(ct)

print(f"\nChi-square statistic = {chi2:.4f}")
print(f"p-value              = {p_value:.4f}")
print(f"Degrees of freedom   = {dof}")
print(f"\nExpected frequencies:")
print(pd.DataFrame(expected, index=ct.index, columns=ct.columns).round(2))
ComponentDescription
χ² statisticMeasures discrepancy between observed and expected frequencies
df(rows - 1) × (columns - 1)
p-valueProbability of observing χ² this large if variables are independent

Fisher's Exact Test

For small sample sizes (expected frequencies < 5), use Fisher's exact test:

# Small sample example
small_table = np.array([[5, 2], [1, 4]])

odds_ratio, p_fisher = stats.fisher_exact(small_table)
print(f"Small table:\n{small_table}")
print(f"Odds ratio = {odds_ratio:.4f}")
print(f"Fisher p   = {p_fisher:.4f}")

When to Use Fisher's Exact Test

Use Fisher's exact test instead of chi-square when: (1) any expected frequency is less than 5, (2) the total sample size is small (n < 20), or (3) the table is 2×2.


Measures of Association

# Cramér's V for any table size
n = ct.sum().sum()
r, c = ct.shape
v = np.sqrt(chi2 / (n * min(r-1, c-1)))
print(f"Cramér's V = {v:.4f}")

# Phi for 2x2 tables
phi = np.sqrt(chi2 / n)
print(f"Phi (φ)    = {phi:.4f}")

Contingency Tables in Machine Learning

ML ApplicationUsageWhy
Confusion matrixPrediction vs actualCore classification metric
Lift tablePredicted probability vs actual outcomeModel calibration
Feature analysisTwo categorical featuresRelationship discovery
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, classification_report

np.random.seed(42)
y_true = np.random.choice([0, 1, 2], 300, p=[0.5, 0.3, 0.2])
y_pred = np.random.choice([0, 1, 2], 300, p=[0.5, 0.3, 0.2])

ct = pd.DataFrame(confusion_matrix(y_true, y_pred),
                   columns=['Pred 0', 'Pred 1', 'Pred 2'],
                   index=['True 0', 'True 1', 'True 2'])
print("Confusion matrix (contingency table):")
print(ct)
print(f"\nAccuracy: {np.trace(ct.values) / ct.values.sum():.3f}")

Key Takeaways

Contingency tables show joint frequency distributions of categorical variables in a matrix format.

Expected frequency = (row total × column total) / grand total — the count you'd expect if variables were independent.

The chi-square test assesses whether variables are independent, while Fisher's exact test is preferred for small samples.

Cramér's V quantifies the strength of association (0 = none, 1 = perfect), giving you a sense of practical significance beyond p-values.

Always check expected frequencies before interpreting chi-square results — the test's validity depends on having enough observations in each cell.

Premium Content

Contingency Tables — Construction, Analysis, and Chi-Square

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement