🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

AIC and BIC — Information Criteria for Model Selection

StatisticsModel Selection🟢 Free Lesson

Advertisement

AIC and BIC — Information Criteria for Model Selection

Statistics

Balancing Model Fit Against Complexity

Information criteria penalize models for having too many parameters, preventing overfitting while rewarding good fit. AIC targets predictive accuracy; BIC targets the true model — their comparison reveals whether complexity is justified.

  • Time Series — Select the best ARIMA order from competing specifications

  • Epidemiology — Choose among risk factor models with different covariate sets

  • Ecology — Compare species distribution models with varying environmental predictors

Lower information criteria values indicate models that balance simplicity and accuracy best.


Information criteria balance model fit against complexity to select the best model among candidates. They provide a principled way to avoid overfitting.

DfInformation Criterion

A metric that penalizes models for having more parameters, balancing goodness-of-fit with parsimony. Lower values indicate better models.


Akaike Information Criterion (AIC)

AIC

AIC=2ln(L)+2kAIC = -2\ln(L) + 2k

Here,

  • LL=Maximized likelihood value
  • kk=Number of estimated parameters
  • 2ln(L)-2\ln(L)=Deviance (measure of lack of fit)

AIC Interpretation

AIC estimates the out-of-sample prediction error. Among a set of models, the one with the lowest AIC is expected to have the best predictive performance.


Bayesian Information Criterion (BIC)

BIC

BIC=2ln(L)+kln(n)BIC = -2\ln(L) + k\ln(n)

Here,

  • nn=Sample size
  • kk=Number of parameters

AIC vs BIC

  • AIC: Optimizes predictive accuracy; tends to select larger models

  • BIC: Optimizes model identification (finds the true model); tends to select smaller models

  • BIC penalizes complexity more heavily than AIC when n>7n > 7 (ln(n)>2\ln(n) > 2)


Corrected AIC (AICc)

For small samples, AIC can be overly liberal (overfits).

AICc

AICc=AIC+2k(k+1)nk1AICc = AIC + \frac{2k(k+1)}{n - k - 1}

Here,

  • AICcAICc=Corrected AIC
  • nn=Sample size
  • kk=Number of parameters

Use AICc When

Use AICc when n/k<40n/k < 40. It converges to AIC as nn \to \infty.


Deviance Information Criterion (DIC)

For Bayesian models:

DIC

DIC=D(θˉ)+2pDDIC = D(\bar{\theta}) + 2p_D

Here,

  • D(θˉ)D(\bar{\theta})=Deviance at posterior mean
  • pDp_D=Effective number of parameters

Comparing Models

Likelihood Ratio Test

For nested models:

Likelihood Ratio Test

χ2=2(lnL0lnL1)\chi^2 = -2(\ln L_0 - \ln L_1)

Here,

  • L0L_0=Likelihood of simpler (restricted) model
  • L1L_1=Likelihood of more complex model
  • χ2\chi^2=Test statistic with $df = k_1 - k_0$

Information Criterion Comparison

| Metric | Values | Interpretation |

|--------|--------|---------------|

| ΔAIC=0\Delta AIC = 0 | Best model | Strongest support |

| 0<ΔAIC<20 < \Delta AIC < 2 | | Substantial support |

| 2<ΔAIC<42 < \Delta AIC < 4 | | Considerable support |

| 4<ΔAIC<74 < \Delta AIC < 7 | | Much less support |

| ΔAIC>10\Delta AIC > 10 | | Essentially no support |


Evidence Ratios

Akaike Weight

wi=exp(ΔAICi/2)j=1Rexp(ΔAICj/2)w_i = \frac{\exp(-\Delta AIC_i / 2)}{\sum_{j=1}^{R}\exp(-\Delta AIC_j / 2)}

Here,

  • wiw_i=Akaike weight for model i (probability of being best)
  • ΔAICi\Delta AIC_i=Difference from best model

Interpreting Weights

An Akaike weight of 0.85 means the model has an 85% probability of being the best among the candidate set (given the data and criteria).


Python Implementation


import numpy as np

import pandas as pd

import statsmodels.api as sm

from scipy import stats

import matplotlib.pyplot as plt



np.random.seed(42)



# Generate data: true model is quadratic

n = 100

X = np.random.uniform(-3, 3, n)

Y = 2 + 1.5*X - 0.8*X**2 + np.random.randn(n) * 1.5



# Fit models of increasing complexity

models = {}

for degree in range(1, 6):

    X_poly = np.column_stack([X**i for i in range(degree + 1)])

    X_poly = sm.add_constant(X_poly)

    model = sm.OLS(Y, X_poly).fit()

    models[degree] = model



# Compare AIC and BIC

print("Model Comparison:")

print(f"{'Degree':<10} {'AIC':<10} {'BIC':<10} {'AICc':<10} {'k':<5}")

print("-" * 45)

for deg, m in models.items():

    aic = m.aic

    bic = m.bic

    k = m.df_model + 1

    aicc = aic + 2*k*(k+1)/(n - k - 1)

    print(f"{deg:<10} {aic:<10.1f} {bic:<10.1f} {aicc:<10.1f} {k:<5}")



# Akaike weights

aics = np.array([m.aic for m in models.values()])

delta_aics = aics - aics.min()

weights = np.exp(-delta_aics / 2)

weights = weights / weights.sum()



print("\nAkaike Weights:")

for deg, w in zip(models.keys(), weights):

    print(f"  Degree {deg}: {w:.3f}")



# Likelihood ratio test (nested models)

lr_stat = -2 * (models[1].llf - models[2].llf)

lr_pval = 1 - stats.chi2.cdf(lr_stat, 1)

print(f"\nLR test (degree 1 vs 2): ?²={lr_stat:.2f}, p={lr_pval:.4f}")

Worked Example

Example: Variable Selection in Regression

Comparing models with different predictor sets:

| Model | Variables | k | AIC | BIC | AICc |

|-------|----------|---|-----|-----|------|

| 1 | X1 | 2 | 452.1 | 458.3 | 452.4 |

| 2 | X1, X2 | 3 | 445.3 | 454.5 | 445.7 |

| 3 | X1, X2, X3 | 4 | 447.8 | 460.1 | 448.5 |

| 4 | X1, X2, X3, X4 | 5 | 450.2 | 465.5 | 451.2 |

AIC selects Model 2 (lowest AIC)

BIC selects Model 1 (penalizes extra parameters more)

Conclusion: Model 2 with X1 and X2 is the best predictive model.


Key Takeaways

Summary: AIC and BIC

  • AIC = 2ln(L)+2k-2\ln(L) + 2k: optimizes predictive accuracy

  • BIC = 2ln(L)+kln(n)-2\ln(L) + k\ln(n): optimizes model identification; penalizes complexity more

  • AICc adds a correction for small samples

  • Lower is better for all information criteria

  • Use ΔAIC\Delta AIC and Akaike weights for model comparison

  • AIC tends to select larger models; BIC selects smaller models

  • For nested models, the likelihood ratio test is also appropriate

  • Always report multiple criteria (AIC, BIC, AICc) for transparency


Related Topics

Premium Content

AIC and BIC — Information Criteria for Model Selection

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement