🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Geostatistics

Advanced Statistical MethodsSpatial Methods🟢 Free Lesson

Advertisement

Geostatistics

Advanced Statistical Methods

Estimating Resources From Sparse Spatial Samples

Geostatistics provides the theory of regionalized variables, using semivariogram modeling and kriging variants to produce optimal spatial predictions with quantified uncertainty from limited sample data.

  • Mining — Estimate ore grades and reserves from drill hole samples for mine planning
  • Petroleum — Model reservoir properties across a field for well placement optimization
  • Environmental science — Map pollutant concentrations from sparse monitoring station data

Geostatistics turns scattered samples into continuous, uncertainty-quantified resource maps.


Geostatistics originated in mining and petroleum exploration (Matheron, 1963) and has since expanded to hydrology, soil science, ecology, and atmospheric science. The central object is the regionalized variable—a random function Z(s)Z(\mathbf{s}) whose realizations are spatially correlated but deterministic at any fixed location.


Regionalized Variables

DfRegionalized Variable

A regionalized variable is a realization of a random function {Z(s):sD}\{Z(\mathbf{s}): \mathbf{s} \in \mathcal{D}\} observed at nn locations {s1,,sn}\{\mathbf{s}_1, \dots, \mathbf{s}_n\}. The observed values {z(si)}\{z(\mathbf{s}_i)\} are treated as a partial realization of the underlying stochastic process.

Core Assumptions

Geostatistical analysis relies on two key regularity conditions:

  • Intrinsic stationarity: E[Z(s)]=μ\mathbb{E}[Z(\mathbf{s})] = \mu (constant mean) and Var[Z(s+h)Z(s)]=2γ(h)\text{Var}[Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})] = 2\gamma(\mathbf{h}) depends only on h\mathbf{h}.
  • Second-order stationarity (weak stationarity): E[Z(s)]=μ\mathbb{E}[Z(\mathbf{s})] = \mu and Cov[Z(s),Z(s+h)]=C(h)\text{Cov}[Z(\mathbf{s}), Z(\mathbf{s}+\mathbf{h})] = C(\mathbf{h}) depends only on h\mathbf{h}, with C(0)<C(0) < \infty.

Semivariogram Theory

DfTheoretical Semivariogram

For an intrinsically stationary process:

γ(h)=12Var[Z(s+h)Z(s)]=12E[(Z(s+h)Z(s))2]\gamma(\mathbf{h}) = \frac{1}{2}\,\text{Var}\big[Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})\big] = \frac{1}{2}\,\mathbb{E}\big[\big(Z(\mathbf{s}+\mathbf{h}) - Z(\mathbf{s})\big)^2\big]

ThLink Between Semivariogram and Covariance

Under second-order stationarity, γ(h)=C(0)C(h)\gamma(\mathbf{h}) = C(0) - C(\mathbf{h}). This equivalence allows switching between the two representations, though the semivariogram framework is more robust when the process is intrinsically but not strictly stationary.

Properties of Valid Semivariograms

A function γ(h)\gamma(\mathbf{h}) is a valid semivariogram if and only if it satisfies:

  1. γ(0)=0\gamma(\mathbf{0}) = 0
  2. γ(h)=γ(h)\gamma(-\mathbf{h}) = \gamma(\mathbf{h}) (symmetry)
  3. γ(h)-\gamma(\mathbf{h}) is conditionally negative definite: for any set of locations and weights {wi}\{w_i\} with iwi=0\sum_i w_i = 0, ijwiwjγ(sisj)0\sum_i \sum_j w_i w_j \gamma(\mathbf{s}_i - \mathbf{s}_j) \leq 0.

Valid Variogram Models

  • Spherical (aa = range): γ(h)=c0+c1[3h2a12(ha)3]\gamma(h) = c_0 + c_1\left[\frac{3h}{2a} - \frac{1}{2}\left(\frac{h}{a}\right)^3\right] for hah \leq a. Linear behavior near the origin; reaches sill exactly at h=ah = a.
  • Exponential: γ(h)=c0+c1[1e3h/a]\gamma(h) = c_0 + c_1\left[1 - e^{-3h/a}\right]. Asymptotically approaches sill; practical range is 3a3a (95% of sill).
  • Gaussian: γ(h)=c0+c1[1e3h2/a2]\gamma(h) = c_0 + c_1\left[1 - e^{-3h^2/a^2}\right]. Parabolic behavior near origin (very smooth); practical range is a3a\sqrt{3}.
  • Matérn: γ(h)=c0+c1[121νΓ(ν)(2νha)νKν(2νha)]\gamma(h) = c_0 + c_1\left[1 - \frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}\,h}{a}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}\,h}{a}\right)\right]. Controls smoothness via ν\nu: ν=0.5\nu = 0.5 (exponential), ν\nu \to \infty (Gaussian).
  • Power: γ(h)=c0+αhβ\gamma(h) = c_0 + \alpha h^\beta for 0<β<20 < \beta < 2. No sill; models unbounded variability (e.g., fractional Brownian fields).

Parameter Estimation

Weighted Least Squares Variogram Estimation

θ^=argminθk=1KN(hk)2[γ^(hk)γ(hk;θ)]2wk\hat{\theta} = \arg\min_\theta \sum_{k=1}^{K} \frac{N(\mathbf{h}_k)}{2}\left[\hat{\gamma}(\mathbf{h}_k) - \gamma(\mathbf{h}_k; \theta)\right]^2 \cdot w_k

Common weighting schemes: wk=1w_k = 1 (equal weights), wk=N(hk)w_k = N(\mathbf{h}_k) (proportional to pair count), or wk=1/Var[γ^(hk)]w_k = 1/\text{Var}[\hat{\gamma}(\mathbf{h}_k)] (inverse variance). Maximum likelihood estimation is asymptotically more efficient but requires distributional assumptions.


Kriging Variants

Simple Kriging (SK)

Simple Kriging

Z^SK(s0)=μ+cC1(Zμ1)\hat{Z}_{SK}(\mathbf{s}_0) = \mu + \mathbf{c}^\top \mathbf{C}^{-1}(\mathbf{Z} - \mu\mathbf{1})

Assumes known mean μ\mu. The kriging variance σSK2(s0)=C(0)cC1c\sigma^2_{SK}(\mathbf{s}_0) = C(0) - \mathbf{c}^\top \mathbf{C}^{-1}\mathbf{c} is independent of observed values—it depends only on geometry.

Ordinary Kriging (OK)

Ordinary Kriging

(C110)(λψ)=(c1)\begin{pmatrix} \mathbf{C} & \mathbf{1} \\ \mathbf{1}^\top & 0 \end{pmatrix} \begin{pmatrix} \boldsymbol{\lambda} \\ \psi \end{pmatrix} = \begin{pmatrix} \mathbf{c} \\ 1 \end{pmatrix}

The unbiasedness constraint iλi=1\sum_i \lambda_i = 1 eliminates the need for a known mean. The Lagrange multiplier ψ\psi equals σOK2σSK2\sigma^2_{OK} - \sigma^2_{SK} when the true mean equals μ\mu.

Indicator Kriging

For categorical or non-Gaussian data, indicator kriging works with transformed values:

Indicator Kriging

I(s;zk)={1if Z(s)zk0otherwiseI(\mathbf{s}; z_k) = \begin{cases} 1 & \text{if } Z(\mathbf{s}) \leq z_k \\ 0 & \text{otherwise} \end{cases}

Each threshold zkz_k yields a separate kriging system, producing the conditional cumulative distribution function (CCDF) at each unsampled location.


Block Kriging

Block kriging estimates the average value over a support VV (a block or panel) rather than at a point:

Block Kriging

Zˉ(V)=1VVZ(s)ds\bar{Z}(V) = \frac{1}{|V|}\int_V Z(\mathbf{s})\,d\mathbf{s}

The block kriging variance satisfies σOK2(V)σOK2(s0)\sigma^2_{OK}(V) \leq \sigma^2_{OK}(\mathbf{s}_0) for any point s0V\mathbf{s}_0 \in V—the support effect reduces estimation variance. The block covariance is:

Cˉ(V,V)=1V2VVC(ss)dsds\bar{C}(V, V) = \frac{1}{|V|^2}\int_V\int_V C(\mathbf{s} - \mathbf{s}')\,d\mathbf{s}\,d\mathbf{s}'

Change of Support

Converting point-support estimates to block-support estimates (and vice versa) is one of the most important and challenging problems in geostatistics. Log-normal and affine corrections provide approximations, but model-based approaches using Gaussian random field simulations are preferred for complex geometries.


Cross-Validation and Model Validation

Kriging Cross-Validation Score

Leave-one-out cross-validation (LOOCV) computes, for each observation ii:

ei=z(si)Z^(i)(si),si2=σOK,(i)2(si)e_i = z(\mathbf{s}_i) - \hat{Z}_{(-i)}(\mathbf{s}_i), \quad s_i^2 = \sigma^2_{OK,(-i)}(\mathbf{s}_i)

Under a correctly specified model:

  • Standardized errors ei/sie_i / s_i should be approximately N(0,1)N(0, 1)
  • The mean error eˉ0\bar{e} \approx 0 (unbiasedness)
  • The mean squared standardized error 1ni(ei/si)21\frac{1}{n}\sum_i (e_i/s_i)^2 \approx 1
  • No correlation between eie_i and spatial location

Non-Stationary Geostatistics

Df Locally Stationary Model

Partition D\mathcal{D} into overlapping neighborhoods. Within each neighborhood, stationarity is assumed and a local variogram is estimated. The parameters vary smoothly over space:

γ(s;h)=γ(h;θ(s))\gamma(\mathbf{s}; \mathbf{h}) = \gamma(\mathbf{h}; \theta(\mathbf{s}))

Alternatively, the external drift kriging model incorporates auxiliary fields:

External Drift Kriging

Z(s)=μ(s)+ε(s),μ(s)=k=1Kfk(s)βkZ(\mathbf{s}) = \mu(\mathbf{s}) + \varepsilon(\mathbf{s}), \quad \mu(\mathbf{s}) = \sum_{k=1}^{K} f_k(\mathbf{s})\beta_k

where the drift is specified by external variables fk(s)f_k(\mathbf{s}) (e.g., elevation, temperature) observed everywhere.


Resource Estimation

Volume–Mean Relationship

For a mineral deposit divided into mm panels ViV_i of equal volume, the total resource is:

Q^=Vi=1mZ^OK(Vi)\hat{Q} = |V|\sum_{i=1}^{m}\hat{Z}_{OK}(V_i)

with variance Var[Q^]=V2ijCˉ(Vi,Vj)(1λi(j)λj(i))\text{Var}[\hat{Q}] = |V|^2 \sum_i \sum_j \bar{C}(V_i, V_j)(1 - \lambda_i^{(j)} - \lambda_j^{(i)}).

The discrete Gaussian model (DGM) provides the conditional distribution of block grades for resource classification:

G(s0)=Z(s0)Z^OK(s0)σOK(s0)N(0,1)G(\mathbf{s}_0) = \frac{Z(\mathbf{s}_0) - \hat{Z}_{OK}(\mathbf{s}_0)}{\sigma_{OK}(\mathbf{s}_0)} \sim N(0, 1)

This standardized variable is used for local uncertainty propagation via Turning Bands simulation.


Python Implementation

import numpy as np
import pandas as pd
from scipy.optimize import minimize
from scipy.spatial.distance import pdist, squareform, cdist
import matplotlib.pyplot as plt

np.random.seed(42)

# --- Simulate regionalized data ---
n = 150
coords = np.column_stack([
    np.random.uniform(0, 100, n),
    np.random.uniform(0, 100, n)
])

def matern_cov(h, sill, range_param, nu):
    from scipy.special import kv, gamma as gamma_func
    h = np.maximum(h, 1e-10)
    coeff = (2 ** (1 - nu)) / gamma_func(nu)
    arg = np.sqrt(2 * nu) * h / range_param
    return sill * coeff * (arg ** nu) * kv(nu, arg)

# Generate from Matérn covariance
D = squareform(pdist(coords))
C = matern_cov(D, sill=3.0, range_param=20.0, nu=1.5)
C += 1e-6 * np.eye(n)  # nugget for stability
L = np.linalg.cholesky(C)
z = L @ np.random.standard_normal(n)

# --- Empirical variogram ---
def compute_variogram(coords, z, n_bins=20, max_dist=50.0):
    D = squareform(pdist(coords))
    pairs = np.triu_indices(len(z), k=1)
    dists, diffs_sq = D[pairs], (z[pairs[0]] - z[pairs[1]]) ** 2

    bins = np.linspace(0, max_dist, n_bins + 1)
    centers = 0.5 * (bins[:-1] + bins[1:])
    gamma, counts = np.zeros(n_bins), np.zeros(n_bins)

    for i in range(n_bins):
        mask = (dists >= bins[i]) & (dists < bins[i + 1])
        counts[i] = mask.sum()
        if counts[i] > 0:
            gamma[i] = 0.5 * diffs_sq[mask].mean()
        else:
            gamma[i] = np.nan
    return centers, gamma, counts

h_vals, gamma_emp, counts = compute_variogram(coords, z)
valid = ~np.isnan(gamma_emp)

# --- Fit Matérn variogram via weighted least squares ---
def matern_variogram(h, nugget, sill, range_param, nu):
    from scipy.special import kv, gamma as gamma_func
    h = np.maximum(h, 1e-10)
    coeff = (2 ** (1 - nu)) / gamma_func(nu)
    arg = np.sqrt(2 * nu) * h / range_param
    full_sill = nugget + sill
    cov = sill * coeff * (arg ** nu) * kv(nu, arg)
    return full_sill - cov

def fit_variogram_wls(h, gamma, counts, model_func):
    weights = counts / counts.sum()
    def objective(params):
        nugget, sill, range_p, nu = params
        if any(p < 0 for p in params):
            return 1e10
        fitted = model_func(h, nugget, sill, range_p, nu)
        return np.sum(weights * (gamma - fitted) ** 2)
    result = minimize(objective, [0.1, 3.0, 20.0, 1.0],
                      method='Nelder-Mead', options={'maxiter': 5000})
    return result.x

popt = fit_variogram_wls(h_vals[valid], gamma_emp[valid], counts[valid], matern_variogram)
print(f"Matérn fit: nugget={popt[0]:.3f}, sill={popt[1]:.3f}, range={popt[2]:.3f}, nu={popt[3]:.3f}")

# --- Ordinary Kriging ---
def ordinary_kriging(coords, z, new_point, cov_func, **cov_params):
    from scipy.spatial.distance import cdist
    C = cov_func(squareform(pdist(coords)), **cov_params)
    c = cov_func(cdist(new_point, coords).ravel(), **cov_params)
    c0 = cov_func(np.array([0.0]), **cov_params)[0]

    n = len(z)
    C_ext = np.block([[C, np.ones((n, 1))], [np.ones((1, n)), 0]])
    rhs = np.append(c, 1.0)

    try:
        sol = np.linalg.solve(C_ext + 1e-8 * np.eye(n + 1), rhs)
    except np.linalg.LinAlgError:
        sol = np.linalg.lstsq(C_ext, rhs, rcond=None)[0]

    weights = sol[:n]
    pred = weights @ z
    variance = c0 - weights @ c
    return pred, variance

def cov_func(dist, nugget, sill, range_param, nu):
    from scipy.special import kv, gamma as gamma_func
    dist = np.maximum(dist, 1e-10)
    coeff = (2 ** (1 - nu)) / gamma_func(nu)
    arg = np.sqrt(2 * nu) * dist / range_param
    return nugget + sill * coeff * (arg ** nu) * kv(nu, arg)

new_pt = np.array([[50.0, 50.0]])
pred, var = ordinary_kriging(coords, z, new_pt, cov_func,
                              nugget=popt[0], sill=popt[1],
                              range_param=popt[2], nu=popt[3])
print(f"\nOK prediction at (50,50): {pred:.3f} ± {np.sqrt(max(var, 0)):.3f}")

# --- Cross-validation ---
errors, sse = [], []
for i in range(n):
    mask = np.arange(n) != i
    pred_i, var_i = ordinary_kriging(coords[mask], z[mask], coords[i:i+1], cov_func,
                                      nugget=popt[0], sill=popt[1],
                                      range_param=popt[2], nu=popt[3])
    errors.append(z[i] - pred_i)
    sse.append(errors[-1] ** 2)

errors = np.array(errors)
print(f"\nLOOCV Mean Error: {errors.mean():.4f}")
print(f"LOOCV RMSE: {np.sqrt(np.mean(sse)):.4f}")
print(f"LOOCV MAD: {np.mean(np.abs(errors)):.4f}")

Summary

Key Takeaways: Geostatistics

  1. Regionalized variables bridge deterministic fields and random processes; intrinsic stationarity is the minimal assumption for variogram-based analysis.
  2. The semivariogram must be conditionally negative definite; the Matérn family provides a flexible, interpretable parameterization with explicit smoothness control.
  3. Kriging variants (SK, OK, indicator, block) each address specific modeling needs; block kriging captures the support effect and reduces estimation variance.
  4. Cross-validation via LOOCV with standardized errors is essential for model validation; a mean squared standardized error near 1 indicates correct specification.
  5. Non-stationary geostatistics (local neighborhoods, external drift) extends the framework to spatially varying mean and covariance structures.

Premium Content

Geostatistics

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement