Geostatistics
Advanced Statistical Methods
Estimating Resources From Sparse Spatial Samples
Geostatistics provides the theory of regionalized variables, using semivariogram modeling and kriging variants to produce optimal spatial predictions with quantified uncertainty from limited sample data.
- Mining — Estimate ore grades and reserves from drill hole samples for mine planning
- Petroleum — Model reservoir properties across a field for well placement optimization
- Environmental science — Map pollutant concentrations from sparse monitoring station data
Geostatistics turns scattered samples into continuous, uncertainty-quantified resource maps.
Geostatistics originated in mining and petroleum exploration (Matheron, 1963) and has since expanded to hydrology, soil science, ecology, and atmospheric science. The central object is the regionalized variable—a random function whose realizations are spatially correlated but deterministic at any fixed location.
Regionalized Variables
DfRegionalized Variable
A regionalized variable is a realization of a random function observed at locations . The observed values are treated as a partial realization of the underlying stochastic process.
Core Assumptions
Geostatistical analysis relies on two key regularity conditions:
- Intrinsic stationarity: (constant mean) and depends only on .
- Second-order stationarity (weak stationarity): and depends only on , with .
Semivariogram Theory
DfTheoretical Semivariogram
For an intrinsically stationary process:
ThLink Between Semivariogram and Covariance
Under second-order stationarity, . This equivalence allows switching between the two representations, though the semivariogram framework is more robust when the process is intrinsically but not strictly stationary.
Properties of Valid Semivariograms
A function is a valid semivariogram if and only if it satisfies:
- (symmetry)
- is conditionally negative definite: for any set of locations and weights with , .
Valid Variogram Models
- Spherical ( = range): for . Linear behavior near the origin; reaches sill exactly at .
- Exponential: . Asymptotically approaches sill; practical range is (95% of sill).
- Gaussian: . Parabolic behavior near origin (very smooth); practical range is .
- Matérn: . Controls smoothness via : (exponential), (Gaussian).
- Power: for . No sill; models unbounded variability (e.g., fractional Brownian fields).
Parameter Estimation
Weighted Least Squares Variogram Estimation
Common weighting schemes: (equal weights), (proportional to pair count), or (inverse variance). Maximum likelihood estimation is asymptotically more efficient but requires distributional assumptions.
Kriging Variants
Simple Kriging (SK)
Simple Kriging
Assumes known mean . The kriging variance is independent of observed values—it depends only on geometry.
Ordinary Kriging (OK)
Ordinary Kriging
The unbiasedness constraint eliminates the need for a known mean. The Lagrange multiplier equals when the true mean equals .
Indicator Kriging
For categorical or non-Gaussian data, indicator kriging works with transformed values:
Indicator Kriging
Each threshold yields a separate kriging system, producing the conditional cumulative distribution function (CCDF) at each unsampled location.
Block Kriging
Block kriging estimates the average value over a support (a block or panel) rather than at a point:
Block Kriging
The block kriging variance satisfies for any point —the support effect reduces estimation variance. The block covariance is:
Change of Support
Converting point-support estimates to block-support estimates (and vice versa) is one of the most important and challenging problems in geostatistics. Log-normal and affine corrections provide approximations, but model-based approaches using Gaussian random field simulations are preferred for complex geometries.
Cross-Validation and Model Validation
Kriging Cross-Validation Score
Leave-one-out cross-validation (LOOCV) computes, for each observation :
Under a correctly specified model:
- Standardized errors should be approximately
- The mean error (unbiasedness)
- The mean squared standardized error
- No correlation between and spatial location
Non-Stationary Geostatistics
Df Locally Stationary Model
Partition into overlapping neighborhoods. Within each neighborhood, stationarity is assumed and a local variogram is estimated. The parameters vary smoothly over space:
Alternatively, the external drift kriging model incorporates auxiliary fields:
External Drift Kriging
where the drift is specified by external variables (e.g., elevation, temperature) observed everywhere.
Resource Estimation
Volume–Mean Relationship
For a mineral deposit divided into panels of equal volume, the total resource is:
with variance .
The discrete Gaussian model (DGM) provides the conditional distribution of block grades for resource classification:
This standardized variable is used for local uncertainty propagation via Turning Bands simulation.
Python Implementation
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from scipy.spatial.distance import pdist, squareform, cdist
import matplotlib.pyplot as plt
np.random.seed(42)
# --- Simulate regionalized data ---
n = 150
coords = np.column_stack([
np.random.uniform(0, 100, n),
np.random.uniform(0, 100, n)
])
def matern_cov(h, sill, range_param, nu):
from scipy.special import kv, gamma as gamma_func
h = np.maximum(h, 1e-10)
coeff = (2 ** (1 - nu)) / gamma_func(nu)
arg = np.sqrt(2 * nu) * h / range_param
return sill * coeff * (arg ** nu) * kv(nu, arg)
# Generate from Matérn covariance
D = squareform(pdist(coords))
C = matern_cov(D, sill=3.0, range_param=20.0, nu=1.5)
C += 1e-6 * np.eye(n) # nugget for stability
L = np.linalg.cholesky(C)
z = L @ np.random.standard_normal(n)
# --- Empirical variogram ---
def compute_variogram(coords, z, n_bins=20, max_dist=50.0):
D = squareform(pdist(coords))
pairs = np.triu_indices(len(z), k=1)
dists, diffs_sq = D[pairs], (z[pairs[0]] - z[pairs[1]]) ** 2
bins = np.linspace(0, max_dist, n_bins + 1)
centers = 0.5 * (bins[:-1] + bins[1:])
gamma, counts = np.zeros(n_bins), np.zeros(n_bins)
for i in range(n_bins):
mask = (dists >= bins[i]) & (dists < bins[i + 1])
counts[i] = mask.sum()
if counts[i] > 0:
gamma[i] = 0.5 * diffs_sq[mask].mean()
else:
gamma[i] = np.nan
return centers, gamma, counts
h_vals, gamma_emp, counts = compute_variogram(coords, z)
valid = ~np.isnan(gamma_emp)
# --- Fit Matérn variogram via weighted least squares ---
def matern_variogram(h, nugget, sill, range_param, nu):
from scipy.special import kv, gamma as gamma_func
h = np.maximum(h, 1e-10)
coeff = (2 ** (1 - nu)) / gamma_func(nu)
arg = np.sqrt(2 * nu) * h / range_param
full_sill = nugget + sill
cov = sill * coeff * (arg ** nu) * kv(nu, arg)
return full_sill - cov
def fit_variogram_wls(h, gamma, counts, model_func):
weights = counts / counts.sum()
def objective(params):
nugget, sill, range_p, nu = params
if any(p < 0 for p in params):
return 1e10
fitted = model_func(h, nugget, sill, range_p, nu)
return np.sum(weights * (gamma - fitted) ** 2)
result = minimize(objective, [0.1, 3.0, 20.0, 1.0],
method='Nelder-Mead', options={'maxiter': 5000})
return result.x
popt = fit_variogram_wls(h_vals[valid], gamma_emp[valid], counts[valid], matern_variogram)
print(f"Matérn fit: nugget={popt[0]:.3f}, sill={popt[1]:.3f}, range={popt[2]:.3f}, nu={popt[3]:.3f}")
# --- Ordinary Kriging ---
def ordinary_kriging(coords, z, new_point, cov_func, **cov_params):
from scipy.spatial.distance import cdist
C = cov_func(squareform(pdist(coords)), **cov_params)
c = cov_func(cdist(new_point, coords).ravel(), **cov_params)
c0 = cov_func(np.array([0.0]), **cov_params)[0]
n = len(z)
C_ext = np.block([[C, np.ones((n, 1))], [np.ones((1, n)), 0]])
rhs = np.append(c, 1.0)
try:
sol = np.linalg.solve(C_ext + 1e-8 * np.eye(n + 1), rhs)
except np.linalg.LinAlgError:
sol = np.linalg.lstsq(C_ext, rhs, rcond=None)[0]
weights = sol[:n]
pred = weights @ z
variance = c0 - weights @ c
return pred, variance
def cov_func(dist, nugget, sill, range_param, nu):
from scipy.special import kv, gamma as gamma_func
dist = np.maximum(dist, 1e-10)
coeff = (2 ** (1 - nu)) / gamma_func(nu)
arg = np.sqrt(2 * nu) * dist / range_param
return nugget + sill * coeff * (arg ** nu) * kv(nu, arg)
new_pt = np.array([[50.0, 50.0]])
pred, var = ordinary_kriging(coords, z, new_pt, cov_func,
nugget=popt[0], sill=popt[1],
range_param=popt[2], nu=popt[3])
print(f"\nOK prediction at (50,50): {pred:.3f} ± {np.sqrt(max(var, 0)):.3f}")
# --- Cross-validation ---
errors, sse = [], []
for i in range(n):
mask = np.arange(n) != i
pred_i, var_i = ordinary_kriging(coords[mask], z[mask], coords[i:i+1], cov_func,
nugget=popt[0], sill=popt[1],
range_param=popt[2], nu=popt[3])
errors.append(z[i] - pred_i)
sse.append(errors[-1] ** 2)
errors = np.array(errors)
print(f"\nLOOCV Mean Error: {errors.mean():.4f}")
print(f"LOOCV RMSE: {np.sqrt(np.mean(sse)):.4f}")
print(f"LOOCV MAD: {np.mean(np.abs(errors)):.4f}")
Summary
Key Takeaways: Geostatistics
- Regionalized variables bridge deterministic fields and random processes; intrinsic stationarity is the minimal assumption for variogram-based analysis.
- The semivariogram must be conditionally negative definite; the Matérn family provides a flexible, interpretable parameterization with explicit smoothness control.
- Kriging variants (SK, OK, indicator, block) each address specific modeling needs; block kriging captures the support effect and reduces estimation variance.
- Cross-validation via LOOCV with standardized errors is essential for model validation; a mean squared standardized error near 1 indicates correct specification.
- Non-stationary geostatistics (local neighborhoods, external drift) extends the framework to spatially varying mean and covariance structures.