πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Chi-Square Goodness-of-Fit Test

Hypothesis TestingNonparametric Tests🟒 Free Lesson

Advertisement

Chi-Square Goodness-of-Fit Test

Hypothesis Testing

Does Your Data Match the Theory?

The chi-square goodness-of-fit test evaluates whether observed frequencies match expected theoretical frequencies. It is essential for validating distributional assumptions and testing genetic models.

  • Genetics β€” Testing whether offspring ratios match Mendelian predictions
  • Manufacturing β€” Verifying that product characteristics follow specified distributions
  • Marketing β€” Analyzing whether customer preferences match expected market models

The goodness-of-fit test is the first step in validating any theoretical model.


Tests whether observed frequencies match a set of expected (theoretical) frequencies.

Chi-Square Goodness-of-Fit Statistic

Ο‡2=βˆ‘i=1k(Oiβˆ’Ei)2Ei,df=kβˆ’1\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}, \quad df = k-1

Here,

  • Ο‡2\chi^2=The chi-square test statistic
  • OiO_i=Observed frequency in category i
  • EiE_i=Expected frequency in category i
  • kk=Number of categories
  • dfdf=Degrees of freedom

Worked Example: Are Die Rolls Fair?

Chi-Square Goodness-of-Fit Test

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Example: Are die rolls fair?
# Observed 60 rolls: each face should appear 10 times
observed = np.array([8, 12, 9, 11, 13, 7])  # Observed counts
expected = np.array([10, 10, 10, 10, 10, 10])  # Equal probability

chi2, p_value = stats.chisquare(observed, expected)
df = len(observed) - 1

print("=== Chi-Square Goodness-of-Fit Test ===")
print("Hβ‚€: Die is fair (each face equally likely)")
print("H₁: Die is NOT fair")
print()
print(f"{'Face':<8} {'Observed':>10} {'Expected':>10} {'(O-E)Β²/E':>10}")
print("-" * 40)
for i, (o, e) in enumerate(zip(observed, expected), 1):
    component = (o-e)**2/e
    print(f"Face {i:<3} {o:>10} {e:>10.1f} {component:>10.4f}")
print("-" * 40)
print(f"{'Total':<8} {observed.sum():>10} {expected.sum():>10.1f} {chi2:>10.4f}")
print(f"χ²({df}) = {chi2:.4f}, p = {p_value:.4f}")
print(f"Decision: {'Reject Hβ‚€ β€” die is biased' if p_value < 0.05 else 'Fail to reject Hβ‚€ β€” die appears fair'}")

Visualization

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
faces = [f'Face {i}' for i in range(1, 7)]
x = np.arange(6)
axes[0].bar(x - 0.2, observed, 0.4, label='Observed', color='steelblue', alpha=0.8)
axes[0].bar(x + 0.2, expected, 0.4, label='Expected', color='coral', alpha=0.8)
axes[0].set_xticks(x)
axes[0].set_xticklabels(faces)
axes[0].set_title('Observed vs Expected Frequencies')
axes[0].legend()

# Chi-square distribution
x_chi = np.linspace(0, 20, 500)
axes[1].plot(x_chi, stats.chi2.pdf(x_chi, df=df), 'b-', linewidth=2)
axes[1].fill_between(x_chi, stats.chi2.pdf(x_chi, df=df), where=x_chi >= chi2, alpha=0.4, color='red')
axes[1].axvline(chi2, color='red', linewidth=2, linestyle='--', label=f'χ²={chi2:.3f}')
axes[1].axvline(stats.chi2.ppf(0.95, df=df), color='black', linewidth=1.5, linestyle=':',
               label=f'Critical value={stats.chi2.ppf(0.95,df=df):.3f}')
axes[1].set_title(f'χ²({df}) Distribution (p={p_value:.3f})')
axes[1].legend()
plt.tight_layout()
plt.savefig('chi_square_gof.png', dpi=150)
plt.show()

Real Example: Testing Normality

Normality Test via Chi-Square

# Real example: Testing normality of data
np.random.seed(42)
data = np.random.normal(50, 10, 200)
# Bin into 5 intervals and compare to expected normal proportions
bins = [-np.inf, 35, 45, 55, 65, np.inf]
obs, _ = np.histogram(data, bins=bins)
# Expected proportions under N(50,10)
probs = np.diff(stats.norm.cdf(bins, 50, 10))
exp = probs * len(data)
chi2_norm, p_norm = stats.chisquare(obs, exp)
print(f"Normality test via chi-square: χ²={chi2_norm:.3f}, p={p_norm:.4f}")

Assumptions

Assumptions

  • Each expected frequency Eα΅’ β‰₯ 5 (merge small categories if needed)
  • Observations are independent
  • Categories are mutually exclusive and exhaustive

Key Takeaways

Summary: Chi-Square Goodness-of-Fit

  • Goodness-of-fit tests whether data matches a theoretical distribution
  • df = k βˆ’ 1 (k = number of categories)
  • Expected frequencies must be β‰₯ 5 β€” merge bins if violated
  • Any theoretical distribution can be tested: normal, Poisson, binomial, uniform
  • CramΓ©r's V = √(χ²/n) measures effect size
⭐

Premium Content

Chi-Square Goodness-of-Fit Test

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement