🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

ARIMA Models — Complete Guide

StatisticsTime Series Analysis🟢 Free Lesson

Advertisement

ARIMA Models — Complete Guide

Statistics

Combining Autoregression, Integration, and Moving Average

ARIMA models unify three powerful concepts — autoregressive dependence, differencing for stationarity, and moving average error correction — into a flexible framework for modeling and forecasting time series.

  • Financial Forecasting — Predict stock volatility and returns

  • Supply Chain — Forecast demand with seasonal patterns and trends

  • Energy — Model electricity consumption for grid management

The three parameters (p,d,q) encode the memory, trend, and noise structure of any time series.


ARIMA (Autoregressive Integrated Moving Average) models combine autoregression, differencing, and moving average components to model and forecast time series data.

DfARIMA(p,d,q) Model

An ARIMA(p,d,q) model is defined as:

ARIMA(p,d,q)

ϕ(B)(1B)dYt=θ(B)εt\phi(B)(1-B)^d Y_t = \theta(B)\varepsilon_t

Here,

  • pp=Order of the autoregressive (AR) part
  • dd=Degree of differencing
  • qq=Order of the moving average (MA) part
  • BB=Backshift operator: $BY_t = Y_{t-1}$
  • ϕ(B)\phi(B)=AR polynomial: $1 - \phi_1 B - \cdots - \phi_p B^p$
  • θ(B)\theta(B)=MA polynomial: $1 + \theta_1 B + \cdots + \theta_q B^q$
  • εt\varepsilon_t=White noise error term

Component Models

AR(p) — Autoregressive

AR(p) Model

Yt=c+ϕ1Yt1+ϕ2Yt2++ϕpYtp+εtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \cdots + \phi_p Y_{t-p} + \varepsilon_t

Here,

  • ϕi\phi_i=AR coefficient at lag i
  • cc=Constant (drift)

Stationarity Condition

An AR(p) process is stationary if all roots of ϕ(z)=1ϕ1zϕpzp\phi(z) = 1 - \phi_1 z - \cdots - \phi_p z^p lie outside the unit circle.

MA(q) — Moving Average

MA(q) Model

Yt=μ+εt+θ1εt1+θ2εt2++θqεtqY_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \cdots + \theta_q \varepsilon_{t-q}

Here,

  • θi\theta_i=MA coefficient at lag i
  • μ\mu=Mean of the process

Invertibility

An MA(q) process is invertible if all roots of θ(z)=1+θ1z++θqzq\theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q lie outside the unit circle.

ARMA(p,q)

ARMA(p,q) Model

Yt=c+i=1pϕiYti+εt+j=1qθjεtjY_t = c + \sum_{i=1}^{p}\phi_i Y_{t-i} + \varepsilon_t + \sum_{j=1}^{q}\theta_j \varepsilon_{t-j}

Here,

  • pp=Number of AR terms
  • qq=Number of MA terms

Model Identification Steps

| Step | Action |

|------|--------|

| 1 | Plot the series — look for trend, seasonality |

| 2 | Test stationarity — ADF and KPSS tests |

| 3 | Difference if needed — determine d |

| 4 | Examine ACF/PACF — identify p and q |

| 5 | Estimate model parameters |

| 6 | Diagnostics — check residuals |

| 7 | Forecast and evaluate |

AR vs MA

  • ACF cuts off -> MA model; look at ACF lag for q

  • PACF cuts off -> AR model; look at PACF lag for p

  • Both tail off -> ARMA; use AIC/BIC to compare models


Information Criteria

AIC and BIC

AIC=2ln(L)+2kAIC = -2\ln(L) + 2k

Here,

  • LL=Maximized likelihood value
  • kk=Number of parameters
  • nn=Sample size

Lower AIC/BIC indicates a better model. BIC penalizes complexity more heavily.


Residual Diagnostics

After fitting, check that residuals are white noise:

  1. Ljung-Box test: No autocorrelation in residuals

  2. Normality test: Residuals approximately normal

  3. Plot: No patterns in residuals vs. fitted values

Ljung-Box Statistic

Q=T(T+2)k=1hρ^k2TkQ = T(T+2)\sum_{k=1}^{h}\frac{\hat{\rho}_k^2}{T-k}

Here,

  • TT=Sample size
  • hh=Number of lags tested
  • ρ^k\hat{\rho}_k=Sample ACF at lag k

Python Implementation


import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.tsa.arima.model import ARIMA

from statsmodels.tsa.stattools import adfuller

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf



np.random.seed(42)



# Simulate AR(1) process

n = 300

y = np.zeros(n)

for t in range(1, n):

    y[t] = 0.7 * y[t-1] + np.random.randn()



# Stationarity test

adf = adfuller(y)

print(f"ADF p-value: {adf[1]:.4f}")



# Fit ARIMA(1,0,0) = AR(1)

model = ARIMA(y, order=(1, 0, 0))

results = model.fit()

print(results.summary())



# Diagnostics

print(f"\nLjung-Box p-value: {results.test_serial_correlation('ljungbox', lags=[10])[0]['lb_pvalue'].values[0]:.4f}")



# Forecast

forecast = results.forecast(steps=10)

print(f"\n10-step forecast: {forecast[:5].round(3)}")

Worked Example

Example: Fitting ARIMA(1,1,1)

Given a non-stationary time series with 200 observations:

  1. ADF test: p-value = 0.45 -> non-stationary

  2. First difference ΔYt\Delta Y_t: ADF p-value = 0.001 -> stationary, so d = 1

  3. ACF of differenced series: Significant spike at lag 1, then cuts off -> q = 1

  4. PACF of differenced series: Exponential decay -> p = 1

  5. Fit ARIMA(1,1,1): AIC = 450.2

  6. Compare with ARIMA(0,1,1): AIC = 455.8 -> ARIMA(1,1,1) is preferred

  7. Ljung-Box test on residuals: p-value = 0.35 -> residuals are white noise

  8. Conclusion: ARIMA(1,1,1) provides a good fit


Forecasting

Point Forecast

Y^T+hT=E[YT+hY1,,YT]\hat{Y}_{T+h|T} = E[Y_{T+h} | Y_1, \ldots, Y_T]

Here,

  • hh=Forecast horizon
  • TT=Last observation time

Forecast accuracy is measured by:

| Metric | Formula | Interpretation |

|--------|---------|---------------|

| MAE | 1hYtY^t\frac{1}{h}\sum|Y_t - \hat{Y}_t| | Average absolute error |

| RMSE | 1h(YtY^t)2\sqrt{\frac{1}{h}\sum(Y_t - \hat{Y}_t)^2} | Penalizes large errors |

| MAPE | 100hYtY^tYt\frac{100}{h}\sum\left|\frac{Y_t - \hat{Y}_t}{Y_t}\right| | Percentage error |


Key Takeaways

Summary: ARIMA Models

  • ARIMA(p,d,q) combines AR, differencing, and MA components

  • Use ACF/PACF to identify p and q; use ADF/KPSS to determine d

  • AR(p): PACF cuts off at lag p

  • MA(q): ACF cuts off at lag q

  • Always check residual diagnostics (Ljung-Box, normality)

  • Use AIC/BIC to compare non-nested models

  • Over-differencing can introduce unnecessary dependence


Related Topics

Premium Content

ARIMA Models — Complete Guide

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement