Optimal Experimental Design

Advanced Statistical Methods

Maximizing Information From Every Experiment

Optimal experimental design uses criteria like D-optimality, A-optimality, and E-optimality to construct designs that extract the most information per experimental unit. The information matrix guides the search for best configurations.

Drug development — Minimize the number of patients needed to detect treatment effects
Industrial R&D — Optimize material composition experiments with expensive ingredients
Computer experiments — Design efficient space-filling configurations for simulation studies

Optimal design ensures not a single experimental unit is wasted on uninformative trials.

Optimal experimental design (OED) provides a principled framework for allocating experimental resources to maximize the information gained about model parameters. Rather than relying on classical designs (factorial, Latin square), OED uses mathematical optimization to construct designs tailored to specific models, objectives, and constraints. This lesson develops the theory of optimal design criteria, information-theoretic foundations, and computational algorithms.

Information Matrix

DfLinear Regression Model

Consider the linear regression model $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where $\mathbf{X}$ is the $n \times p$ design matrix, $\boldsymbol{\beta}$ is the parameter vector, and $\boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2 \mathbf{I}_n)$ . The information matrix is:

\mathbf{M} = \mathbf{X}^\top \mathbf{X} = \sum_{i=1}^{n} \mathbf{x}_i \mathbf{x}_i^\top

The covariance matrix of the least squares estimator $\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1}\mathbf{X}^\top \mathbf{y}$ is $\text{Cov}(\hat{\boldsymbol{\beta}}) = \sigma^2 \mathbf{M}^{-1}$ . Thus, a larger information matrix $\mathbf{M}$ yields more precise parameter estimates.

DfDesign Measure

A design $\xi$ on a design space $\mathcal{X} \subseteq \mathbb{R}^p$ is a probability measure assigning weights $w_i \geq 0$ to support points $\mathbf{x}_i \in \mathcal{X}$ with $\sum_{i=1}^{n} w_i = 1$ . The information matrix of design $\xi$ is:

\mathbf{M}(\xi) = \sum_{i=1}^{k} w_i \, \mathbf{x}_i \mathbf{x}_i^\top = \int_{\mathcal{X}} \mathbf{x}\mathbf{x}^\top \, d\xi(\mathbf{x})

where $k$ is the number of support points. A design with $k \leq p + 1$ support points is exact; the continuous relaxation with $k > p + 1$ is the approximate design.

Optimal Design Criteria

The three classical optimality criteria are matrix orderings of $\mathbf{M}^{-1}$ , each optimizing a different scalar function of the parameter covariance.

D-Optimality

D-optimal designs maximize the determinant of the information matrix:

\max_{\xi} \det(\mathbf{M}(\xi)) \quad \Longleftrightarrow \quad \min_{\xi} \det(\text{Cov}(\hat{\boldsymbol{\beta}})) = \min_{\xi} (\sigma^{2p} / \det(\mathbf{M}(\xi)))

D-optimality minimizes the volume of the confidence ellipsoid for $\boldsymbol{\beta}$ . The D-efficiency of design $\xi$ relative to $\xi^*$ is:

\text{D-eff}(\xi) = \left(\frac{\det(\mathbf{M}(\xi))}{\det(\mathbf{M}(\xi^*))}\right)^{1/p}

D-optimal designs are invariant to non-singular linear transformations of the design space.

A-Optimality

A-optimal designs minimize the trace of the inverse information matrix:

\min_{\xi} \text{tr}(\mathbf{M}(\xi)^{-1}) = \min_{\xi} \sum_{j=1}^{p} \text{Var}(\hat{\beta}_j) / \sigma^2

A-optimality minimizes the average variance of the parameter estimates. Unlike D-optimality, A-optimality is not invariant to reparameterization.

E-Optimality

E-optimal designs maximize the minimum eigenvalue of the information matrix:

\max_{\xi} \lambda_{\min}(\mathbf{M}(\xi)) \quad \Longleftrightarrow \quad \min_{\xi} \frac{1}{\lambda_{\min}(\mathbf{M}(\xi))} = \min_{\xi} \max_{\|\mathbf{c}\|=1} \text{Var}(\mathbf{c}^\top \hat{\boldsymbol{\beta}}) / \sigma^2

E-optimality minimizes the worst-case variance over all linear combinations $\mathbf{c}^\top \boldsymbol{\beta}$ with $\|\mathbf{c}\| = 1$ . It ensures robustness against poorly estimated directions in parameter space.

Criterion Comparison

Criterion	Optimizes	Invariance	Interpretation
D-optimal	$\det(\mathbf{M})$	Non-singular transforms	Confidence ellipsoid volume
A-optimal	$\text{tr}(\mathbf{M}^{-1})$	Parameterization-dependent	Average prediction variance
E-optimal	$\lambda_{\min}(\mathbf{M})$	Orthogonal transforms	Worst-case prediction variance
G-optimal	$\max_{\mathbf{x}} \mathbf{x}^\top\mathbf{M}^{-1}\mathbf{x}$	Same as D	Maximum prediction variance

Kiefer-Wolfowitz equivalence theorem: D-optimal designs minimize the maximum prediction variance over $\mathcal{X}$ (G-optimality), establishing D = G equivalence for continuous designs.

Classical Optimal Designs

D-Optimal Design for Polynomial Regression

For polynomial regression $y = \beta_0 + \beta_1 x + \beta_2 x^2 + \dots + \beta_{p-1} x^{p-1}$ on $x \in [-1, 1]$ , the D-optimal design places mass at the Chebyshev points:

\xi^* = \text{Uniform}\left(\left\{\cos\left(\frac{(2k-1)\pi}{2p}\right)\right\}_{k=1}^{p}\right)

For example, with $p = 2$ (linear model), the D-optimal design places equal mass at $x = -1$ and $x = 1$ . With $p = 3$ (quadratic), equal mass at $x = -1, 0, +1$ .

D-Optimal Design for Two-Level Factorial

For a $2^k$ factorial design with $k$ factors at levels $\{-1, +1\}$ , the D-optimal design is the full factorial with equal weights. For a fractional factorial with $n < 2^k$ runs, the D-optimal design selects the $n$ rows of the $\mathbf{X}$ matrix that maximize $\det(\mathbf{X}^\top \mathbf{X})$ .

Sequential Design

DfSequential Design

Sequential design constructs experiments iteratively, choosing each new design point based on all previous observations. This is optimal when:

The number of experiments is not known in advance
Model parameters can be updated after each experiment
Exploration vs exploitation tradeoff is important (Bayesian optimization)

The sequential D-optimal procedure adds the point $\mathbf{x}_{n+1}$ that maximizes:

\mathbf{x}_{n+1} = \arg\max_{\mathbf{x} \in \mathcal{X}} \det(\mathbf{M}_n + \mathbf{x}\mathbf{x}^\top)

This is equivalent to maximizing the predictive variance $\mathbf{x}^\top \mathbf{M}_n^{-1} \mathbf{x}$ at each step.

Bayesian Optimization

Bayesian optimal design places a Gaussian process prior on the response surface and chooses design points to maximize the expected information gain. The Expected Improvement (EI) acquisition function balances:

\text{EI}(\mathbf{x}) = \mathbb{E}[\max(f(\mathbf{x}) - f^+, 0)] = \sigma(\mathbf{x})[\Phi(z) - z\Phi(z) - \phi(z)]

where $z = (\mu(\mathbf{x}) - f^+)/\sigma(\mathbf{x})$ , $f^+$ is the best observed value, and $\Phi$ , $\phi$ are the standard normal CDF and PDF. Bayesian optimization is sample-efficient for expensive black-box functions.

Bayesian Optimal Design

DfBayesian Design

A Bayesian design incorporates prior information about model parameters. The optimal design maximizes the expected information gain over the prior:

\xi^* = \arg\max_{\xi} \mathbb{E}_{\boldsymbol{\beta}} \left[ I(\xi, \boldsymbol{\beta}) \right]

where $I(\xi, \boldsymbol{\beta})$ is the information (e.g., log-likelihood) under design $\xi$ and parameter value $\boldsymbol{\beta}$ . For linear models with prior $\boldsymbol{\beta} \sim N(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$ , the posterior covariance under design $\xi$ with $n$ observations is:

\text{Cov}(\boldsymbol{\beta} \mid \mathbf{y}, \xi) = \left(\mathbf{M}(\xi)/\sigma^2 + \boldsymbol{\Sigma}_0^{-1}\right)^{-1}

The Bayesian D-optimal design maximizes $\det(\boldsymbol{\Sigma}_0^{-1} + \mathbf{M}(\xi)/\sigma^2)$ .

Bayesian vs Classical

Aspect	Classical (Fixed- $\boldsymbol{\beta}$ )	Bayesian
Prior on $\boldsymbol{\beta}$	None	$\boldsymbol{\beta} \sim N(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$
Objective	Minimize worst-case variance	Minimize expected posterior variance
Design points	Often at boundary of $\mathcal{X}$	Can be interior points
Robustness	May be sensitive to model misspecification	Incorporates parameter uncertainty
Computation	Convex optimization (for D-optimal)	Often requires MCMC

Coordinate Exchange Algorithm

DfCoordinate Exchange

The coordinate exchange algorithm (Meyer & Nachtsheim, 1985) is the standard algorithm for constructing exact D-optimal designs. It iterates over each of the $n$ design points, optimizing one point at a time while holding the others fixed.

Coordinate Exchange Procedure

Algorithm: Coordinate Exchange

Initialize: Start with an initial design $\mathbf{X}^{(0)} = [\mathbf{x}_1^{(0)}, \dots, \mathbf{x}_n^{(0)}]^\top$
For each design point $i = 1, \dots, n$ :
- Compute $\mathbf{M}_{-i} = \sum_{j \neq i} \mathbf{x}_j \mathbf{x}_j^\top$ (information without point $i$ )
- Find $\mathbf{x}_i^* = \arg\max_{\mathbf{x} \in \mathcal{X}} \det(\mathbf{M}_{-i} + \mathbf{x}\mathbf{x}^\top)$
- If $\det(\mathbf{M}_{-i} + \mathbf{x}_i^*\mathbf{x}_i^{*\top}) > \det(\mathbf{M})$ , replace $\mathbf{x}_i \leftarrow \mathbf{x}_i^*$
Repeat steps 1–2 until no further improvement (convergence)

For continuous design spaces, the inner maximization $\mathbf{x}_i^* = \arg\max_{\mathbf{x}} \mathbf{x}^\top \mathbf{M}_{-i}^{-1} \mathbf{x}$ is solved by evaluating over a fine grid or using gradient methods.

Algorithm Limitations

Coordinate exchange finds local optima; multiple random restarts are essential
The algorithm can be slow for large $n$ or high-dimensional $\mathcal{X}$
Swaps (interchanging points between design and candidate sets) improve solutions
Fedorov's algorithm is an alternative that swaps points between the current design and a candidate set
For large problems, tabu search, simulated annealing, or genetic algorithms can escape local optima

Design Diagnostics

Prediction Variance Profile

The scaled prediction variance at any point $\mathbf{x} \in \mathcal{X}$ under design $\xi$ is:

\text{SPV}(\mathbf{x}, \xi) = n \cdot \mathbf{x}^\top \mathbf{M}(\xi)^{-1} \mathbf{x}

The maximum prediction variance over $\mathcal{X}$ is the G-optimality criterion. By the Kiefer-Wolfowitz equivalence theorem, for a D-optimal continuous design:

\max_{\mathbf{x} \in \mathcal{X}} \text{SPV}(\mathbf{x}, \xi^*) = p

where $p$ is the number of parameters. The prediction variance profile visualizes the design's efficiency across the design space.

Lack-of-Fit Diagnostics

For a design with replicate points, the pure error estimate is:

\hat{\sigma}_{\text{PE}}^2 = \frac{\sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2}{\sum_{i=1}^{k}(n_i - 1)}

where $k$ is the number of distinct design points and $n_i$ is the number of replicates at point $i$ . The lack-of-fit F-test compares the residual sum of squares to the pure error:

F_{\text{LOF}} = \frac{(\text{RSS} - \text{SSE}_{\text{PE}}) / (p - k)}{\text{SSE}_{\text{PE}} / (n - k)} \sim F_{p-k, n-k}

A significant $F$ -statistic indicates model inadequacy (missing terms, nonlinearity).

Design Diagnostics Checklist

D-efficiency: $\left(\det(\mathbf{M})\right)^{1/p}$ — overall parameter precision
Prediction variance profile: $\text{SPV}(\mathbf{x}, \xi)$ across $\mathcal{X}$ — uniform precision
Alias structure: For fractional factorials, identify aliased effects via $\mathbf{X}^\top\mathbf{X}$ block structure
Rotatability: The prediction variance depends only on distance from center, not direction
Sensitivity: How the design changes under perturbations of the model or design space

Python Implementation

import numpy as np
from scipy.optimize import minimize
from itertools import product
import matplotlib.pyplot as plt

np.random.seed(42)

# --- Design Space ---
# 2-factor design space: x1, x2 in [-1, 1]
candidate_set = np.array(list(product(np.linspace(-1, 1, 20), repeat=2)))
n_factors = 2

# --- Model: Quadratic Regression ---
# y = b0 + b1*x1 + b2*x2 + b11*x1^2 + b22*x2^2 + b12*x1*x2 (p=6 parameters)
def design_matrix(X):
    """Build quadratic regression design matrix."""
    n = X.shape[0]
    return np.column_stack([
        np.ones(n), X[:, 0], X[:, 1],
        X[:, 0]**2, X[:, 1]**2, X[:, 0]*X[:, 1]
    ])

def d_criterion(X, n_factors=2):
    """Negative D-criterion (for minimization)."""
    X_design = design_matrix(X)
    M = X_design.T @ X_design
    sign, logdet = np.linalg.slogdet(M)
    if sign <= 0:
        return 1e10
    return -logdet

# --- Coordinate Exchange Algorithm ---
def coordinate_exchange(n_runs, candidate_set, n_iter=100, n_restarts=10):
    """Find D-optimal design via coordinate exchange."""
    best_design = None
    best_det = -np.inf

    for restart in range(n_restarts):
        # Random initialization
        idx = np.random.choice(len(candidate_set), n_runs, replace=False)
        design = candidate_set[idx].copy()

        for iteration in range(n_iter):
            improved = False
            for i in range(n_runs):
                # Remove point i
                X_rest = np.delete(design, i, axis=0)
                M_rest = design_matrix(X_rest).T @ design_matrix(X_rest)

                # Find best replacement
                best_x = design[i]
                best_det_local = np.linalg.slogdet(M_rest + design_matrix(design[i:i+1]).T @ design_matrix(design[i:i+1]))[1]

                for j, x_cand in enumerate(candidate_set):
                    X_new = design_matrix(x_cand.reshape(1, -1))
                    det_new = np.linalg.slogdet(M_rest + X_new.T @ X_new)[1]
                    if det_new > best_det_local:
                        best_det_local = det_new
                        best_x = x_cand
                        improved = True

                design[i] = best_x

            if not improved:
                break

        # Evaluate final design
        X_final = design_matrix(design)
        det_final = np.linalg.slogdet(X_final.T @ X_final)[1]

        if det_final > best_det:
            best_det = det_final
            best_design = design.copy()

    return best_design, best_det

# --- Generate D-Optimal Design ---
n_runs = 12  # More than p=6 for lack-of-fit test
optimal_design, d_crit = coordinate_exchange(n_runs, candidate_set)

print("=== D-Optimal Design (Quadratic Regression) ===")
print(f"Number of runs: {n_runs}")
print(f"Number of parameters: 6")
print(f"D-criterion (log det M): {d_crit:.4f}")
print(f"\nDesign points:")
X_opt = design_matrix(optimal_design)
M_opt = X_opt.T @ X_opt
print(np.array2string(optimal_design, precision=3, suppress_small=True))

# --- D-Efficiency ---
def d_efficiency(design, reference_det, p=6):
    """Compute D-efficiency relative to reference."""
    X = design_matrix(design)
    det_current = np.linalg.slogdet(X.T @ X)[1]
    return np.exp((det_current - reference_det) / p)

print(f"\nD-efficiency: {d_efficiency(optimal_design, d_crit):.4f}")

# --- Prediction Variance Profile ---
x1_grid = np.linspace(-1, 1, 50)
x2_grid = np.linspace(-1, 1, 50)
X1, X2 = np.meshgrid(x1_grid, x2_grid)
spv = np.zeros_like(X1)

M_inv = np.linalg.inv(M_opt)
for i in range(50):
    for j in range(50):
        x_test = np.array([1, X1[i,j], X2[i,j], X1[i,j]**2, X2[i,j]**2, X1[i,j]*X2[i,j]])
        spv[i, j] = n_runs * x_test @ M_inv @ x_test

print(f"\n=== Prediction Variance ===")
print(f"Min SPV: {spv.min():.2f}")
print(f"Max SPV: {spv.max():.2f}")
print(f"Mean SPV: {spv.mean():.2f}")
print(f"Theoretical max (G-opt): p = 6")

# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Design points
axes[0].scatter(optimal_design[:, 0], optimal_design[:, 1], s=100, c='red', 
                edgecolors='black', zorder=5, label='Design points')
axes[0].scatter(candidate_set[:, 0], candidate_set[:, 1], s=10, c='gray', alpha=0.3, label='Candidates')
axes[0].set_xlabel('x₁')
axes[0].set_ylabel('x₂')
axes[0].set_title(f'D-Optimal Design (n={n_runs}, log det = {d_crit:.2f})')
axes[0].legend()
axes[0].set_xlim([-1.1, 1.1])
axes[0].set_ylim([-1.1, 1.1])
axes[0].grid(True, alpha=0.3)

# Prediction variance profile
im = axes[1].contourf(X1, X2, spv, levels=20, cmap='viridis')
plt.colorbar(im, ax=axes[1], label='SPV(x)')
axes[1].scatter(optimal_design[:, 0], optimal_design[:, 1], s=60, c='red', 
                edgecolors='white', zorder=5)
axes[1].set_xlabel('x₁')
axes[1].set_ylabel('x₂')
axes[1].set_title('Scaled Prediction Variance Profile')

plt.tight_layout()
plt.savefig("optimal_design.png", dpi=150, bbox_inches="tight")
plt.show()

# --- Comparison: Random vs Optimal ---
n_trials = 100
random_dets = []
for _ in range(n_trials):
    idx = np.random.choice(len(candidate_set), n_runs, replace=False)
    X_rand = design_matrix(candidate_set[idx])
    det_val = np.linalg.slogdet(X_rand.T @ X_rand)[1]
    random_dets.append(det_val)

print(f"\n=== Design Comparison ===")
print(f"D-optimal log det: {d_crit:.4f}")
print(f"Random designs: mean={np.mean(random_dets):.4f}, std={np.std(random_dets):.4f}")
print(f"D-efficiency of random vs optimal: {np.exp((np.mean(random_dets) - d_crit)/6)*100:.1f}%")

Summary

Key Takeaways: Optimal Experimental Design

Information matrix $\mathbf{M} = \mathbf{X}^\top\mathbf{X}$ determines parameter precision; $\text{Cov}(\hat{\boldsymbol{\beta}}) = \sigma^2\mathbf{M}^{-1}$ . Optimal designs maximize $\mathbf{M}$ according to a scalar criterion.
D-optimal maximizes $\det(\mathbf{M})$ (confidence ellipsoid volume); A-optimal minimizes $\text{tr}(\mathbf{M}^{-1})$ (average variance); E-optimal maximizes $\lambda_{\min}(\mathbf{M})$ (worst-case variance). D-optimality is the most commonly used criterion.
Classical results: D-optimal designs for polynomial regression use Chebyshev points; $2^k$ factorials are D-optimal for linear models. The Kiefer-Wolfowitz equivalence theorem connects D- and G-optimality.
Coordinate exchange iteratively optimizes each design point; multiple restarts are essential to avoid local optima. Fedorov's algorithm and metaheuristics (tabu, simulated annealing) offer alternatives.
Diagnostics — D-efficiency, prediction variance profiles, and alias structure validate design quality. Sequential and Bayesian designs adapt to accumulating data and prior information, essential for expensive experiments.

Optimal Experimental Design

Optimal Experimental Design

Maximizing Information From Every Experiment

Information Matrix

DfLinear Regression Model

DfDesign Measure

Optimal Design Criteria

D-Optimality

A-Optimality

E-Optimality

Classical Optimal Designs

D-Optimal Design for Polynomial Regression

D-Optimal Design for Two-Level Factorial

Sequential Design

DfSequential Design

Bayesian Optimal Design

DfBayesian Design

Coordinate Exchange Algorithm

DfCoordinate Exchange

Coordinate Exchange Procedure

Design Diagnostics

Prediction Variance Profile

Lack-of-Fit Diagnostics

Python Implementation

Summary

Key Takeaways: Optimal Experimental Design

Premium Content

Need Expert Statistics Help?