AIC and BIC — Information Criteria for Model Selection

Statistics

Balancing Model Fit Against Complexity

Information criteria penalize models for having too many parameters, preventing overfitting while rewarding good fit. AIC targets predictive accuracy; BIC targets the true model — their comparison reveals whether complexity is justified.

Time Series — Select the best ARIMA order from competing specifications
Epidemiology — Choose among risk factor models with different covariate sets
Ecology — Compare species distribution models with varying environmental predictors

Lower information criteria values indicate models that balance simplicity and accuracy best.

Information criteria balance model fit against complexity to select the best model among candidates. They provide a principled way to avoid overfitting.

DfInformation Criterion

A metric that penalizes models for having more parameters, balancing goodness-of-fit with parsimony. Lower values indicate better models.

Akaike Information Criterion (AIC)

AIC

AIC = -2\ln(L) + 2k

Here,

$L$ =Maximized likelihood value
$k$ =Number of estimated parameters
$-2\ln(L)$ =Deviance (measure of lack of fit)

AIC Interpretation

AIC estimates the out-of-sample prediction error. Among a set of models, the one with the lowest AIC is expected to have the best predictive performance.

Bayesian Information Criterion (BIC)

BIC

BIC = -2\ln(L) + k\ln(n)

Here,

$n$ =Sample size
$k$ =Number of parameters

AIC vs BIC

AIC: Optimizes predictive accuracy; tends to select larger models
BIC: Optimizes model identification (finds the true model); tends to select smaller models
BIC penalizes complexity more heavily than AIC when $n > 7$ ( $\ln(n) > 2$ )

Corrected AIC (AICc)

For small samples, AIC can be overly liberal (overfits).

AICc

AICc = AIC + \frac{2k(k+1)}{n - k - 1}

Here,

$AICc$ =Corrected AIC
$n$ =Sample size
$k$ =Number of parameters

Use AICc When

Use AICc when $n/k < 40$ . It converges to AIC as $n \to \infty$ .

Deviance Information Criterion (DIC)

For Bayesian models:

DIC

DIC = D(\bar{\theta}) + 2p_D

Here,

$D(\bar{\theta})$ =Deviance at posterior mean
$p_D$ =Effective number of parameters

Comparing Models

Likelihood Ratio Test

For nested models:

Likelihood Ratio Test

\chi^2 = -2(\ln L_0 - \ln L_1)

Here,

$L_0$ =Likelihood of simpler (restricted) model
$L_1$ =Likelihood of more complex model
$\chi^2$ =Test statistic with $df = k_1 - k_0$

Information Criterion Comparison

| Metric | Values | Interpretation |

|--------|--------|---------------|

| $\Delta AIC = 0$ | Best model | Strongest support |

| $0 < \Delta AIC < 2$ | | Substantial support |

| $2 < \Delta AIC < 4$ | | Considerable support |

| $4 < \Delta AIC < 7$ | | Much less support |

| $\Delta AIC > 10$ | | Essentially no support |

Evidence Ratios

Akaike Weight

w_i = \frac{\exp(-\Delta AIC_i / 2)}{\sum_{j=1}^{R}\exp(-\Delta AIC_j / 2)}

Here,

$w_i$ =Akaike weight for model i (probability of being best)
$\Delta AIC_i$ =Difference from best model

Interpreting Weights

An Akaike weight of 0.85 means the model has an 85% probability of being the best among the candidate set (given the data and criteria).

Python Implementation


import numpy as np

import pandas as pd

import statsmodels.api as sm

from scipy import stats

import matplotlib.pyplot as plt



np.random.seed(42)



# Generate data: true model is quadratic

n = 100

X = np.random.uniform(-3, 3, n)

Y = 2 + 1.5*X - 0.8*X**2 + np.random.randn(n) * 1.5



# Fit models of increasing complexity

models = {}

for degree in range(1, 6):

    X_poly = np.column_stack([X**i for i in range(degree + 1)])

    X_poly = sm.add_constant(X_poly)

    model = sm.OLS(Y, X_poly).fit()

    models[degree] = model



# Compare AIC and BIC

print("Model Comparison:")

print(f"{'Degree':<10} {'AIC':<10} {'BIC':<10} {'AICc':<10} {'k':<5}")

print("-" * 45)

for deg, m in models.items():

    aic = m.aic

    bic = m.bic

    k = m.df_model + 1

    aicc = aic + 2*k*(k+1)/(n - k - 1)

    print(f"{deg:<10} {aic:<10.1f} {bic:<10.1f} {aicc:<10.1f} {k:<5}")



# Akaike weights

aics = np.array([m.aic for m in models.values()])

delta_aics = aics - aics.min()

weights = np.exp(-delta_aics / 2)

weights = weights / weights.sum()



print("\nAkaike Weights:")

for deg, w in zip(models.keys(), weights):

    print(f"  Degree {deg}: {w:.3f}")



# Likelihood ratio test (nested models)

lr_stat = -2 * (models[1].llf - models[2].llf)

lr_pval = 1 - stats.chi2.cdf(lr_stat, 1)

print(f"\nLR test (degree 1 vs 2): ?²={lr_stat:.2f}, p={lr_pval:.4f}")

Worked Example

Example: Variable Selection in Regression

Comparing models with different predictor sets:

| Model | Variables | k | AIC | BIC | AICc |

|-------|----------|---|-----|-----|------|

| 1 | X1 | 2 | 452.1 | 458.3 | 452.4 |

| 2 | X1, X2 | 3 | 445.3 | 454.5 | 445.7 |

| 3 | X1, X2, X3 | 4 | 447.8 | 460.1 | 448.5 |

| 4 | X1, X2, X3, X4 | 5 | 450.2 | 465.5 | 451.2 |

AIC selects Model 2 (lowest AIC)

BIC selects Model 1 (penalizes extra parameters more)

Conclusion: Model 2 with X1 and X2 is the best predictive model.

Key Takeaways

Summary: AIC and BIC

AIC = $-2\ln(L) + 2k$ : optimizes predictive accuracy
BIC = $-2\ln(L) + k\ln(n)$ : optimizes model identification; penalizes complexity more
AICc adds a correction for small samples
Lower is better for all information criteria
Use $\Delta AIC$ and Akaike weights for model comparison
AIC tends to select larger models; BIC selects smaller models
For nested models, the likelihood ratio test is also appropriate
Always report multiple criteria (AIC, BIC, AICc) for transparency

AIC and BIC — Information Criteria for Model Selection

AIC and BIC — Information Criteria for Model Selection

Balancing Model Fit Against Complexity

DfInformation Criterion

Akaike Information Criterion (AIC)

AIC

Bayesian Information Criterion (BIC)

BIC

Corrected AIC (AICc)

AICc

Deviance Information Criterion (DIC)

DIC

Comparing Models

Likelihood Ratio Test

Likelihood Ratio Test

Information Criterion Comparison

Evidence Ratios

Akaike Weight

Python Implementation

Worked Example

Example: Variable Selection in Regression

Key Takeaways

Summary: AIC and BIC

Related Topics

Premium Content

Need Expert Statistics Help?