πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Path Analysis

Advanced Statistical MethodsMultivariate Methods🟒 Free Lesson

Advertisement

Path Analysis

Advanced Statistical Methods

Tracing Cause and Effect Through Systems

Path analysis decomposes relationships among observed variables into direct, indirect, and total effects using path diagrams and structural equations. It reveals how variables influence each other through chains of causation.

  • Social science β€” Quantify how education affects income both directly and through occupational prestige
  • Epidemiology β€” Trace how risk factors contribute to disease through mediating biological pathways
  • Organizational behavior β€” Map how leadership styles influence employee performance through motivation

Path analysis reveals not just whether variables are related, but how they influence each other.


What Is Path Analysis?

DfPath Analysis

Path analysis is a multivariate technique for modeling direct and indirect causal relationships among a set of observed variables. It is a special case of structural equation modeling in which all variables are observed (no latent variables). Path analysis allows decomposition of correlations into components attributable to direct causation, indirect causation (mediation), and spurious association.

Path analysis was developed by Sewall Wright (1920s) as a method for analyzing causal systems in genetics and economics. It remains foundational in psychology, education, and the social sciences.


Path Diagrams

DfPath Diagram

A path diagram is a graphical representation of a path model:

  • Boxes represent observed variables
  • Single-headed arrows (β†’) represent direct causal effects (path coefficients)
  • Double-headed arrows (↔) represent correlations or covariances
  • Residual arrows (β†’) represent unexplained variance in endogenous variables

The direction of arrows encodes the assumed causal order: variables at the "top" of the causal chain are exogenous (their causes are not modeled); variables at the "bottom" are endogenous (their causes are within the model).


Decomposition of Effects

ThPath Coefficient Decomposition

For a recursive path model with observed variables, the total effect of variable XjX_j on variable YY decomposes as:

rjY=pjY+βˆ‘krjkβ‹…pkYr_{jY} = p_{jY} + \sum_{k} r_{jk} \cdot p_{kY}

where:

  • pjYp_{jY} is the direct effect (path coefficient from XjX_j to YY)
  • rjkβ‹…pkYr_{jk} \cdot p_{kY} represents an indirect effect through mediator XkX_k
  • The sum accounts for all mediating pathways

More generally, for the full system y=By+Ξ“x+ΞΆ\mathbf{y} = B\mathbf{y} + \Gamma\mathbf{x} + \zeta:

TotalΒ effectΒ matrix=(Iβˆ’B)βˆ’1Ξ“\text{Total effect matrix} = (I - B)^{-1}\Gamma

Direct, Indirect, and Total Effects

Totaljβ†’Y=Directjβ†’Y+βˆ‘k∈mediatorsIndirectjβ†’kβ†’Y\text{Total}_{j \to Y} = \text{Direct}_{j \to Y} + \sum_{k \in \text{mediators}} \text{Indirect}_{j \to k \to Y}

Here,

  • Direct\text{Direct}=The path coefficient for the direct arrow from X_j to Y
  • Indirect\text{Indirect}=Product of path coefficients along the mediated pathway
  • Total\text{Total}=Sum of direct and all indirect effects

Identification Rules

ThIdentification Conditions for Path Models

A path model is identified (has a unique solution for the path coefficients) when:

  1. Recursive models (no feedback loops): always identified if there is at least one exogenous variable predicting each endogenous variable

  2. Order condition (necessary): for each endogenous variable, there must be at least as many exogenous variables excluded from its equation as there are endogenous variables included in its equation

  3. Rank condition (necessary and sufficient): for each endogenous variable, the matrix of coefficients of excluded exogenous variables must have full row rank

For the standard recursive path model with pp endogenous variables and qq exogenous variables, the model is always identified when:

  • All exogenous variables are correlated (their covariances are freely estimated)
  • Each endogenous variable has at least one exogenous predictor

Order Condition Explained

For endogenous variable YkY_k, let mkm_k = number of included endogenous variables (including YkY_k) and rkr_k = number of excluded exogenous variables. The order condition requires rkβ‰₯mkβˆ’1r_k \geq m_k - 1. If rk=mkβˆ’1r_k = m_k - 1 exactly, the equation is just-identified. If rk>mkβˆ’1r_k > m_k - 1, it is over-identified and can be tested for fit.


Recursive vs. Non-Recursive Models

DfRecursive Path Model

A recursive path model has no feedback loops: the causal flow is unidirectional. All structural errors are uncorrelated (or at least uncorrelated with all predictors of a given endogenous variable). Recursive models are always identified under standard conditions.

DfNon-Recursive Path Model

A non-recursive model contains feedback loops (e.g., X→Y→XX \to Y \to X) or simultaneous equations. These models require additional identification conditions:

  • The order condition must be satisfied
  • The rank condition must be satisfied
  • Instrumental variables or exclusion restrictions may be needed
  • Estimation typically requires 2SLS, 3SLS, or full information ML

Mediation Analysis via Path Analysis

DfMediation

Variable MM mediates the effect of XX on YY if:

  1. XX affects MM (path aa)
  2. MM affects YY controlling for XX (path bb)
  3. The total effect of XX on YY is partially or fully transmitted through MM

The indirect effect is a×ba \times b. Mediation is present when a×b≠0a \times b \neq 0.

ThSobel Test for Indirect Effects

The classic test for mediation uses the Sobel statistic:

z=a^b^sqrtb^2SEa2+a^2SEb2z = \frac{\hat{a}\hat{b}}{\\sqrt{\hat{b}^2 \text{SE}_a^2 + \hat{a}^2 \text{SE}_b^2}}

Under H0H_0 (no indirect effect), z∼N(0,1)z \sim N(0,1). Modern practice prefers bootstrap confidence intervals for the indirect effect, which do not assume normality of the product distribution a^b^\hat{a}\hat{b}.


Python Implementation

Path Analysis with semopy

import numpy as np
import pandas as pd
from semopy import Model, calc_stats

np.random.seed(42)
n = 500

# True path model:
# X1 β†’ M β†’ Y
# X2 β†’ Y (direct)
# X1 β†’ Y (direct)
# X1 ↔ X2 (correlated exogenous)

x1 = np.random.normal(0, 1, n)
x2 = np.random.normal(0, 1, n)
# X1 and X2 correlated
x2 = 0.3 * x1 + np.sqrt(1 - 0.3**2) * x2

# M = 0.5*X1 + 0.2*X2 + error
m = 0.5 * x1 + 0.2 * x2 + np.random.normal(0, 0.8, n)

# Y = 0.3*X1 + 0.4*X2 + 0.6*M + error
y = 0.3 * x1 + 0.4 * x2 + 0.6 * m + np.random.normal(0, 0.7, n)

df = pd.DataFrame({'X1': x1, 'X2': x2, 'M': m, 'Y': y})

# Define path model
spec = """
# Structural equations
M ~ X1 + X2
Y ~ X1 + X2 + M

# Covariance among exogenous variables
X1 ~~ X2
"""

model = Model()
model.fit(df, spec)

# Parameter estimates
estimates = model.inspect()
print("Path Coefficients:")
print(estimates[['op', 'lval', 'est', 'se', 'p-value']])

# Calculate effects manually from path coefficients
params = estimates.set_index(['op', 'lval', 'rval'])['est']

# Direct effects on Y
direct_x1_y = params[('~', 'Y', 'X1')]
direct_x2_y = params[('~', 'Y', 'X2')]
direct_m_y = params[('~', 'Y', 'M')]

# Direct effects on M
direct_x1_m = params[('~', 'M', 'X1')]
direct_x2_m = params[('~', 'M', 'X2')]

# Indirect effects (through M)
indirect_x1_y = direct_x1_m * direct_m_y
indirect_x2_y = direct_x2_m * direct_m_y

# Total effects
total_x1_y = direct_x1_y + indirect_x1_y
total_x2_y = direct_x2_y + indirect_x2_y

print("\n=== Effect Decomposition ===")
print(f"X1 β†’ Y: Direct = {direct_x1_y:.4f}, Indirect = {indirect_x1_y:.4f}, Total = {total_x1_y:.4f}")
print(f"X2 β†’ Y: Direct = {direct_x2_y:.4f}, Indirect = {indirect_x2_y:.4f}, Total = {total_x2_y:.4f}")

# Model fit
stats = calc_stats(model)
print(f"\nCFI: {stats['CFI'].values[0]:.4f}")
print(f"RMSEA: {stats['RMSEA'].values[0]:.4f}")
print(f"SRMR: {stats['SRMR'].values[0]:.4f}")

Bootstrap Mediation Test

import numpy as np
from scipy import stats

def bootstrap_indirect_effect(x, m, y, n_boot=5000):
    """Bootstrap test for mediation: a*b indirect effect."""
    n = len(x)
    a_coefs, b_coefs, indirects = [], [], []

    for _ in range(n_boot):
        idx = np.random.choice(n, size=n, replace=True)
        x_b, m_b, y_b = x[idx], m[idx], y[idx]

        # Path a: M ~ X
        a = np.polyfit(x_b, m_b, 1)[0]
        # Path b: Y ~ M (controlling for X)
        X_bm = np.column_stack([x_b, m_b, np.ones(n)])
        b_path = np.linalg.lstsq(X_bm, y_b, rcond=None)[0][1]

        a_coefs.append(a)
        b_coefs.append(b_path)
        indirects.append(a * b_path)

    ci = np.percentile(indirects, [2.5, 97.5])
    return np.mean(indirects), np.std(indirects), ci, indirects

np.random.seed(42)
n = 300
x = np.random.normal(0, 1, n)
m = 0.5 * x + np.random.normal(0, 1, n)
y = 0.6 * m + 0.3 * x + np.random.normal(0, 1, n)

mean_ind, se_ind, ci, indirects = bootstrap_indirect_effect(x, m, y)
print(f"Mean indirect effect (a*b): {mean_ind:.4f}")
print(f"Bootstrap SE: {se_ind:.4f}")
print(f"95% CI: [{ci[0]:.4f}, {ci[1]:.4f}]")
print(f"Significant mediation: {'Yes' if ci[0] > 0 or ci[1] < 0 else 'No'}")

Key Takeaways

Summary: Path Analysis

  • Path analysis models direct and indirect causal relationships among observed variables
  • Path diagrams encode causal assumptions: arrows = direct effects; double-headed arrows = correlations
  • Total effect = Direct effect + Indirect effects β€” effects decompose multiplicatively along paths
  • Recursive models (no feedback loops) are always identified under standard conditions
  • Non-recursive models (feedback loops) require exclusion restrictions or instrumental variables
  • Mediation is tested via the indirect effect aΓ—ba \times b; bootstrap CIs are preferred over the Sobel test
  • Path analysis is a special case of SEM with no latent variables β€” use full SEM when measurement error is a concern
  • Always assess model fit (CFI, RMSEA, SRMR) and compare alternative path specifications
⭐

Premium Content

Path Analysis

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement