Spatial Statistics

Advanced Statistical Methods

Analyzing Data Where Location Matters

Spatial statistics accounts for geographic proximity and autocorrelation in data, using tools like Moran's I, variograms, and kriging to model spatial dependencies that standard methods ignore.

Epidemiology — Map disease clusters and identify environmental risk factors by location
Precision agriculture — Optimize fertilizer application using spatially-varying soil measurements
Real estate — Model housing prices with spatial lag and spatial error models

Spatial statistics ensure your analysis respects the first law of geography: everything is related to nearby things.

Spatial data arise whenever observations are indexed by geographical location. Unlike classical statistics, spatial observations are typically not independent: nearby locations tend to exhibit similar values—a phenomenon known as spatial autocorrelation. Ignoring this structure leads to invalid standard errors, inflated Type I error rates, and inefficient predictions. Spatial statistics provides the formal apparatus for modeling, estimating, and predicting under dependence.

Spatial Autocorrelation

Spatial autocorrelation measures the degree to which nearby observations are similar (positive) or dissimilar (negative).

DfSpatial Autocorrelation

Given observations $\{z(\mathbf{s}_i)\}_{i=1}^{n}$ at locations $\mathbf{s}_i \in \mathcal{D} \subset \mathbb{R}^d$ , spatial autocorrelation quantifies the statistical relationship between $z(\mathbf{s}_i)$ and $z(\mathbf{s}_j)$ as a function of their separation vector $\mathbf{s}_i - \mathbf{s}_j$ .

Moran's I

Moran's I Statistic

I = \frac{n}{\sum_{i}\sum_{j} w_{ij}} \cdot \frac{\sum_{i}\sum_{j} w_{ij}\,(z_i - \bar{z})(z_j - \bar{z})}{\sum_{i}(z_i - \bar{z})^2}

where $w_{ij}$ are spatial weights and $\bar{z} = \frac{1}{n}\sum_i z_i$ . Under the null hypothesis of no spatial autocorrelation, $\mathbb{E}[I] \approx -\frac{1}{n-1}$ , and for large $n$ , $I$ is approximately standard normal after standardization.

Geary's C

Geary's C Statistic

C = \frac{(n-1)\sum_{i}\sum_{j} w_{ij}\,(z_i - z_j)^2}{2\,\sum_{i}\sum_{j} w_{ij} \cdot \sum_{i}(z_i - \bar{z})^2}

Values $C < 1$ indicate positive spatial autocorrelation; $C > 1$ indicates negative autocorrelation. Unlike Moran's I (which is correlation-like), Geary's C is based on squared differences and is more sensitive to local variability.

Spatial Weights Matrix

The spatial weights matrix $\mathbf{W} = [w_{ij}]$ encodes the neighborhood structure. Common constructions include:

Contiguity-based: $w_{ij} = 1$ if $i$ and $j$ share a boundary (rook or queen)
Distance-based: $w_{ij} = 1$ if $\|\mathbf{s}_i - \mathbf{s}_j\| < d$ for some cutoff $d$
K-nearest neighbors: $w_{ij} = 1$ if $j$ is among the $k$ closest to $i$
Inverse distance: $w_{ij} = \|\mathbf{s}_i - \mathbf{s}_j\|^{-\alpha}$

Rows are typically row-standardized: $w_{ij}^* = w_{ij} / \sum_k w_{ik}$ .

Testing Moran's I

For a dataset of $n = 100$ soil samples with $I = 0.42$ , the expected value under $H_0$ is $\mathbb{E}[I] = -1/99 = -0.0101$ . The variance of $I$ under normality is:

\text{Var}(I) = \frac{1}{S_0}\left[\frac{(n^2 - 3n + 3)S_1 - nS_2 + 3S_0}{(n^2 - 1)S_0}\right] - \frac{1}{(n-1)^2}

where $S_0 = \sum_{ij}w_{ij}$ , $S_1 = \frac{1}{2}\sum_{ij}(w_{ij}+w_{ji})^2$ , $S_2 = \sum_i(\sum_j w_{ij} + \sum_j w_{ji})^2$ . The z-score is $z = (I - \mathbb{E}[I])/\sqrt{\text{Var}(I)}$ , giving $z \approx 12.4$ —strong evidence of positive spatial autocorrelation.

Variogram Estimation

The variogram is the fundamental tool for quantifying spatial continuity.

DfSemivariogram

The theoretical semivariogram is defined as:

\gamma(\mathbf{h}) = \frac{1}{2}\,\text{Var}\big[Z(\mathbf{s} + \mathbf{h}) - Z(\mathbf{s})\big]

for a second-order stationary process $\{Z(\mathbf{s})\}$ with $\text{Var}[Z(\mathbf{s})] = c(0)$ .

For an intrinsically stationary process (where $\mathbb{E}[Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})] = 0$ and $\text{Var}[Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})] = 2\gamma(\mathbf{h})$ ), the semivariogram exists even when the process is non-stationary in the strict sense.

Empirical Semivariogram

\hat{\gamma}(\mathbf{h}) = \frac{1}{2|N(\mathbf{h})|}\sum_{(i,j) \in N(\mathbf{h})} \big(z(\mathbf{s}_i) - z(\mathbf{s}_j)\big)^2

where $N(\mathbf{h}) = \{(i,j) : \mathbf{s}_i - \mathbf{s}_j \approx \mathbf{h}\}$ is the set of pairs within a tolerance window of the lag vector $\mathbf{h}$ .

Parametric Variogram Models

Common Variogram Models

Spherical: $\gamma(h) = c_0 + c_1\left[\frac{3h}{2a} - \frac{1}{2}\left(\frac{h}{a}\right)^3\right]$ for $h \leq a$ , $\gamma(h) = c_0 + c_1$ otherwise
Exponential: $\gamma(h) = c_0 + c_1\left[1 - \exp\left(-\frac{3h}{a}\right)\right]$
Gaussian: $\gamma(h) = c_0 + c_1\left[1 - \exp\left(-\frac{3h^2}{a^2}\right)\right]$
Matérn: $\gamma(h) = c_0 + c_1\left[1 - \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}\,h}{a}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}\,h}{a}\right)\right]$

where $c_0$ is the nugget, $c_0 + c_1$ is the sill, and $a$ is the range parameter. The Matérn family parameterized by $\nu$ nests the exponential ( $\nu = 0.5$ ) and Gaussian ( $\nu \to \infty$ ) models.

DfAnisotropy

Geometric anisotropy: The range varies with direction. Addressed by applying a coordinate transformation (affine rotation/scaling) to achieve isotropy.
Zonal anisotropy: The sill varies with direction while the range remains constant. Cannot be corrected by simple transformation; requires separate variogram models per direction.
Geometric + Zonal: Combined structure, common in practice.

Kriging

Kriging is the best linear unbiased prediction (BLUP) for spatially correlated data.

Simple Kriging

Simple Kriging Predictor

\hat{Z}(\mathbf{s}_0) = \mu + \mathbf{c}^\top \mathbf{C}^{-1}(\mathbf{Z} - \mu\mathbf{1})

where $\mu = \mathbb{E}[Z(\mathbf{s})]$ is known, $\mathbf{C} = [c(\mathbf{s}_i - \mathbf{s}_j)]$ is the covariance matrix, and $\mathbf{c} = [c(\mathbf{s}_0 - \mathbf{s}_i)]$ is the covariance between the target and observed locations. The kriging variance is $\sigma^2_{SK} = c(0) - \mathbf{c}^\top \mathbf{C}^{-1}\mathbf{c}$ .

Ordinary Kriging

When the mean is unknown but assumed constant, ordinary kriging jointly estimates $\mu$ and $\hat{Z}(\mathbf{s}_0)$ :

Ordinary Kriging System

\begin{pmatrix} \mathbf{C} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \boldsymbol{\lambda} \\ \psi \end{pmatrix} = \begin{pmatrix} \mathbf{c} \\ 1 \end{pmatrix}

The weights $\boldsymbol{\lambda}$ satisfy $\sum_i \lambda_i = 1$ (unbiasedness), and the kriging variance is $\sigma^2_{OK} = c(0) - \mathbf{c}^\top\boldsymbol{\lambda}$ .

Universal Kriging

When the mean is modeled as a spatial trend $m(\mathbf{s}) = \sum_{k=0}^{K} f_k(\mathbf{s})\beta_k$ , universal kriging accounts for this drift:

Universal Kriging

\hat{Z}(\mathbf{s}_0) = \mathbf{f}(\mathbf{s}_0)^\top\hat{\boldsymbol{\beta}} + \mathbf{c}_0^\top \mathbf{\Psi}^{-1}(\mathbf{Z} - \mathbf{F}\hat{\boldsymbol{\beta}})

where $\hat{\boldsymbol{\beta}} = (\mathbf{F}^\top\mathbf{\Psi}^{-1}\mathbf{F})^{-1}\mathbf{F}^\top\mathbf{\Psi}^{-1}\mathbf{Z}$ is the generalized least squares estimator and $\mathbf{\Psi}$ is the residual covariance matrix.

Spatial Regression

Spatial Lag Model

y = \rho W y + X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I_n)

Spatial Error Model

y = X\beta + u, \quad u = \lambda W u + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I_n)

Omitted Variable Bias

Ignoring spatial dependence in the error terms leads to underestimation of standard errors by a factor of approximately $(1 - \hat{\rho})^{-2}$ , inflating t-statistics and increasing false positive rates. Lagrange multiplier tests (LM-lag and LM-error) help diagnose the appropriate model specification.

Point Process Models

For data recorded as event locations $\{\mathbf{s}_1, \dots, \mathbf{s}_n\}$ in a study region $\mathcal{D}$ , point process models characterize the stochastic mechanism generating these events.

DfPoisson Point Process

A homogeneous Poisson point process on $\mathcal{D}$ with intensity $\lambda > 0$ satisfies:

For any Borel set $B \subset \mathcal{D}$ , $N(B) \sim \text{Poisson}(\lambda |B|)$
For disjoint $B_1, B_2$ , $N(B_1)$ and $N(B_2)$ are independent

The inhomogeneous extension allows $\lambda(\mathbf{s})$ to vary over space, with $N(B) \sim \text{Poisson}\left(\int_B \lambda(\mathbf{s})\,d\mathbf{s}\right)$ .

Quadrat Analysis

Divide $\mathcal{D}$ into $m$ quadrats of area $a$ . Under complete spatial randomness (CSR), $N_i \sim \text{Poisson}(\lambda a)$ for each quadrat $i$ . The variance-to-mean ratio (VMR) is:

\text{VMR} = \frac{s^2}{\bar{N}} = \frac{\frac{1}{m-1}\sum_i(N_i - \bar{N})^2}{\frac{1}{m}\sum_i N_i}

VMR = 1 indicates CSR; VMR > 1 suggests clustering; VMR < 1 suggests regularity. The chi-squared statistic $\chi^2 = (m-1) \cdot \text{VMR}$ tests against the Poisson assumption.

Python Implementation

import numpy as np
import pandas as pd
from scipy import stats
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt

np.random.seed(42)

# --- Generate spatial data with autocorrelation ---
n = 200
coords = np.column_stack([
    np.random.uniform(0, 10, n),
    np.random.uniform(0, 10, n)
])

# True surface with spatial trend
def true_surface(x, y):
    return 3 * np.sin(x / 3) + 2 * np.cos(y / 4) + np.random.normal(0, 0.3, len(x))

z = true_surface(coords[:, 0], coords[:, 1])

# --- Spatial weights matrix (KNN) ---
k = 8
nn = NearestNeighbors(n_neighbors=k + 1).fit(coords)
distances, indices = nn.kneighbors(coords)

W = np.zeros((n, n))
for i in range(n):
    for j_idx in range(1, k + 1):
        j = indices[i, j_idx]
        W[i, j] = 1.0 / max(distances[i, j_idx], 1e-10)

# Row-standardize
row_sums = W.sum(axis=1, keepdims=True)
W = W / row_sums

# --- Moran's I ---
z_centered = z - z.mean()
numerator = n * z_centered @ W @ z_centered
denominator = W.sum() * (z_centered ** 2).sum()
morans_i = numerator / denominator

# Expected value and variance under H0
S0 = W.sum()
S1 = 0.5 * ((W + W.T) ** 2).sum()
S2 = ((W.sum(axis=1) + W.sum(axis=0)) ** 2).sum()
EI = -1.0 / (n - 1)
VI = ((n**2 - 3*n + 3)*S1 - n*S2 + 3*S0) / ((n**2 - 1)*S0) - 1.0/(n-1)**2
z_moran = (morans_i - EI) / np.sqrt(VI)

print(f"Moran's I: {morans_i:.4f}")
print(f"Expected I: {EI:.4f}")
print(f"z-score: {z_moran:.4f}")
print(f"p-value: {2 * (1 - stats.norm.cdf(abs(z_moran))):.6f}")

# --- Empirical Semivariogram ---
def empirical_variogram(coords, z, n_bins=15, max_dist=5.0):
    from scipy.spatial.distance import pdist, squareform
    D = squareform(pdist(coords))
    pairs = np.triu_indices(len(z), k=1)
    dists = D[pairs]
    diffs_sq = (z[pairs[0]] - z[pairs[1]]) ** 2

    bins = np.linspace(0, max_dist, n_bins + 1)
    bin_centers = 0.5 * (bins[:-1] + bins[1:])
    gamma = np.zeros(n_bins)

    for i in range(n_bins):
        mask = (dists >= bins[i]) & (dists < bins[i + 1])
        if mask.sum() > 0:
            gamma[i] = 0.5 * diffs_sq[mask].mean()
        else:
            gamma[i] = np.nan

    return bin_centers, gamma

h, gamma_hat = empirical_variogram(coords, z)
valid = ~np.isnan(gamma_hat)

# Fit exponential variogram via least squares
def exponential_variogram(h, nugget, sill, range_param):
    return nugget + sill * (1 - np.exp(-3 * h / range_param))

from scipy.optimize import curve_fit
popt, pcov = curve_fit(
    exponential_variogram, h[valid], gamma_hat[valid],
    p0=[0.1, 2.0, 2.0], maxfev=10000
)
print(f"\nFitted variogram: nugget={popt[0]:.3f}, sill={popt[1]:.3f}, range={popt[2]:.3f}")

# --- Simple Kriging prediction at a new point ---
def sk_predict(coords, z, new_point, cov_func, mu, **kwargs):
    from scipy.spatial.distance import cdist
    c_0 = cov_func(0, **kwargs)
    C = cov_func(cdist(coords, coords), **kwargs)
    c = cov_func(cdist(new_point, coords), **kwargs)

    C_inv = np.linalg.inv(C + 1e-8 * np.eye(len(C)))
    weights = C_inv @ c
    pred = mu + weights @ (z - mu)
    variance = c_0 - c @ C_inv @ c
    return pred, variance

def cov_exponential(dist, sill, range_param):
    return sill * np.exp(-3 * dist / range_param)

new_pt = np.array([[5.0, 5.0]])
mu_hat = z.mean()
pred, var = sk_predict(coords, z, new_pt, cov_exponential, mu_hat,
                        sill=popt[1], range_param=popt[2])
print(f"\nSimple Kriging prediction at (5,5): {pred[0]:.3f} ± {np.sqrt(max(var, 0)):.3f}")

Summary

Key Takeaways: Spatial Statistics

Spatial autocorrelation invalidates classical independence assumptions; test with Moran's I or Geary's C before proceeding.
The semivariogram quantifies spatial continuity: nugget, sill, and range are the three key parameters. The Matérn family provides flexible, nested modeling.
Kriging is the BLUP under Gaussian assumptions: simple kriging (known mean), ordinary kriging (constant unknown mean), and universal kriging (trend + residual).
Spatial regression models (lag and error) account for spatial dependence in outcome or residuals; omitting it inflates Type I error.
Point process models and quadrat analysis characterize event locations; the Poisson process is the null model for complete spatial randomness.

Spatial Statistics

Spatial Statistics

Analyzing Data Where Location Matters

Spatial Autocorrelation

DfSpatial Autocorrelation

Moran's I

Moran's I Statistic

Geary's C

Geary's C Statistic

Testing Moran's I

Variogram Estimation

DfSemivariogram

Empirical Semivariogram

Parametric Variogram Models

DfAnisotropy

Kriging

Simple Kriging

Simple Kriging Predictor

Ordinary Kriging

Ordinary Kriging System

Universal Kriging

Universal Kriging

Spatial Regression

Spatial Lag Model

Spatial Error Model

Point Process Models

DfPoisson Point Process

Quadrat Analysis

Quadrat Analysis

Python Implementation

Summary

Key Takeaways: Spatial Statistics

Premium Content

Need Expert Statistics Help?