Optimal Experimental Design
Advanced Statistical Methods
Maximizing Information From Every Experiment
Optimal experimental design uses criteria like D-optimality, A-optimality, and E-optimality to construct designs that extract the most information per experimental unit. The information matrix guides the search for best configurations.
- Drug development β Minimize the number of patients needed to detect treatment effects
- Industrial R&D β Optimize material composition experiments with expensive ingredients
- Computer experiments β Design efficient space-filling configurations for simulation studies
Optimal design ensures not a single experimental unit is wasted on uninformative trials.
Optimal experimental design (OED) provides a principled framework for allocating experimental resources to maximize the information gained about model parameters. Rather than relying on classical designs (factorial, Latin square), OED uses mathematical optimization to construct designs tailored to specific models, objectives, and constraints. This lesson develops the theory of optimal design criteria, information-theoretic foundations, and computational algorithms.
Information Matrix
DfLinear Regression Model
Consider the linear regression model where is the design matrix, is the parameter vector, and . The information matrix is:
The covariance matrix of the least squares estimator is . Thus, a larger information matrix yields more precise parameter estimates.
DfDesign Measure
A design on a design space is a probability measure assigning weights to support points with . The information matrix of design is:
where is the number of support points. A design with support points is exact; the continuous relaxation with is the approximate design.
Optimal Design Criteria
The three classical optimality criteria are matrix orderings of , each optimizing a different scalar function of the parameter covariance.
D-Optimality
D-optimal designs maximize the determinant of the information matrix:
D-optimality minimizes the volume of the confidence ellipsoid for . The D-efficiency of design relative to is:
D-optimal designs are invariant to non-singular linear transformations of the design space.
A-Optimality
A-optimal designs minimize the trace of the inverse information matrix:
A-optimality minimizes the average variance of the parameter estimates. Unlike D-optimality, A-optimality is not invariant to reparameterization.
E-Optimality
E-optimal designs maximize the minimum eigenvalue of the information matrix:
E-optimality minimizes the worst-case variance over all linear combinations with . It ensures robustness against poorly estimated directions in parameter space.
Criterion Comparison
| Criterion | Optimizes | Invariance | Interpretation |
|---|---|---|---|
| D-optimal | Non-singular transforms | Confidence ellipsoid volume | |
| A-optimal | Parameterization-dependent | Average prediction variance | |
| E-optimal | Orthogonal transforms | Worst-case prediction variance | |
| G-optimal | Same as D | Maximum prediction variance |
Kiefer-Wolfowitz equivalence theorem: D-optimal designs minimize the maximum prediction variance over (G-optimality), establishing D = G equivalence for continuous designs.
Classical Optimal Designs
D-Optimal Design for Polynomial Regression
For polynomial regression on , the D-optimal design places mass at the Chebyshev points:
For example, with (linear model), the D-optimal design places equal mass at and . With (quadratic), equal mass at .
D-Optimal Design for Two-Level Factorial
For a factorial design with factors at levels , the D-optimal design is the full factorial with equal weights. For a fractional factorial with runs, the D-optimal design selects the rows of the matrix that maximize .
Sequential Design
DfSequential Design
Sequential design constructs experiments iteratively, choosing each new design point based on all previous observations. This is optimal when:
- The number of experiments is not known in advance
- Model parameters can be updated after each experiment
- Exploration vs exploitation tradeoff is important (Bayesian optimization)
The sequential D-optimal procedure adds the point that maximizes:
This is equivalent to maximizing the predictive variance at each step.
Bayesian Optimization
Bayesian optimal design places a Gaussian process prior on the response surface and chooses design points to maximize the expected information gain. The Expected Improvement (EI) acquisition function balances:
where , is the best observed value, and , are the standard normal CDF and PDF. Bayesian optimization is sample-efficient for expensive black-box functions.
Bayesian Optimal Design
DfBayesian Design
A Bayesian design incorporates prior information about model parameters. The optimal design maximizes the expected information gain over the prior:
where is the information (e.g., log-likelihood) under design and parameter value . For linear models with prior , the posterior covariance under design with observations is:
The Bayesian D-optimal design maximizes .
Bayesian vs Classical
| Aspect | Classical (Fixed-) | Bayesian |
|---|---|---|
| Prior on | None | |
| Objective | Minimize worst-case variance | Minimize expected posterior variance |
| Design points | Often at boundary of | Can be interior points |
| Robustness | May be sensitive to model misspecification | Incorporates parameter uncertainty |
| Computation | Convex optimization (for D-optimal) | Often requires MCMC |
Coordinate Exchange Algorithm
DfCoordinate Exchange
The coordinate exchange algorithm (Meyer & Nachtsheim, 1985) is the standard algorithm for constructing exact D-optimal designs. It iterates over each of the design points, optimizing one point at a time while holding the others fixed.
Coordinate Exchange Procedure
Algorithm: Coordinate Exchange
-
Initialize: Start with an initial design
-
For each design point :
- Compute (information without point )
- Find
- If , replace
-
Repeat steps 1β2 until no further improvement (convergence)
For continuous design spaces, the inner maximization is solved by evaluating over a fine grid or using gradient methods.
Algorithm Limitations
- Coordinate exchange finds local optima; multiple random restarts are essential
- The algorithm can be slow for large or high-dimensional
- Swaps (interchanging points between design and candidate sets) improve solutions
- Fedorov's algorithm is an alternative that swaps points between the current design and a candidate set
- For large problems, tabu search, simulated annealing, or genetic algorithms can escape local optima
Design Diagnostics
Prediction Variance Profile
The scaled prediction variance at any point under design is:
The maximum prediction variance over is the G-optimality criterion. By the Kiefer-Wolfowitz equivalence theorem, for a D-optimal continuous design:
where is the number of parameters. The prediction variance profile visualizes the design's efficiency across the design space.
Lack-of-Fit Diagnostics
For a design with replicate points, the pure error estimate is:
where is the number of distinct design points and is the number of replicates at point . The lack-of-fit F-test compares the residual sum of squares to the pure error:
A significant -statistic indicates model inadequacy (missing terms, nonlinearity).
Design Diagnostics Checklist
- D-efficiency: β overall parameter precision
- Prediction variance profile: across β uniform precision
- Alias structure: For fractional factorials, identify aliased effects via block structure
- Rotatability: The prediction variance depends only on distance from center, not direction
- Sensitivity: How the design changes under perturbations of the model or design space
Python Implementation
import numpy as np
from scipy.optimize import minimize
from itertools import product
import matplotlib.pyplot as plt
np.random.seed(42)
# --- Design Space ---
# 2-factor design space: x1, x2 in [-1, 1]
candidate_set = np.array(list(product(np.linspace(-1, 1, 20), repeat=2)))
n_factors = 2
# --- Model: Quadratic Regression ---
# y = b0 + b1*x1 + b2*x2 + b11*x1^2 + b22*x2^2 + b12*x1*x2 (p=6 parameters)
def design_matrix(X):
"""Build quadratic regression design matrix."""
n = X.shape[0]
return np.column_stack([
np.ones(n), X[:, 0], X[:, 1],
X[:, 0]**2, X[:, 1]**2, X[:, 0]*X[:, 1]
])
def d_criterion(X, n_factors=2):
"""Negative D-criterion (for minimization)."""
X_design = design_matrix(X)
M = X_design.T @ X_design
sign, logdet = np.linalg.slogdet(M)
if sign <= 0:
return 1e10
return -logdet
# --- Coordinate Exchange Algorithm ---
def coordinate_exchange(n_runs, candidate_set, n_iter=100, n_restarts=10):
"""Find D-optimal design via coordinate exchange."""
best_design = None
best_det = -np.inf
for restart in range(n_restarts):
# Random initialization
idx = np.random.choice(len(candidate_set), n_runs, replace=False)
design = candidate_set[idx].copy()
for iteration in range(n_iter):
improved = False
for i in range(n_runs):
# Remove point i
X_rest = np.delete(design, i, axis=0)
M_rest = design_matrix(X_rest).T @ design_matrix(X_rest)
# Find best replacement
best_x = design[i]
best_det_local = np.linalg.slogdet(M_rest + design_matrix(design[i:i+1]).T @ design_matrix(design[i:i+1]))[1]
for j, x_cand in enumerate(candidate_set):
X_new = design_matrix(x_cand.reshape(1, -1))
det_new = np.linalg.slogdet(M_rest + X_new.T @ X_new)[1]
if det_new > best_det_local:
best_det_local = det_new
best_x = x_cand
improved = True
design[i] = best_x
if not improved:
break
# Evaluate final design
X_final = design_matrix(design)
det_final = np.linalg.slogdet(X_final.T @ X_final)[1]
if det_final > best_det:
best_det = det_final
best_design = design.copy()
return best_design, best_det
# --- Generate D-Optimal Design ---
n_runs = 12 # More than p=6 for lack-of-fit test
optimal_design, d_crit = coordinate_exchange(n_runs, candidate_set)
print("=== D-Optimal Design (Quadratic Regression) ===")
print(f"Number of runs: {n_runs}")
print(f"Number of parameters: 6")
print(f"D-criterion (log det M): {d_crit:.4f}")
print(f"\nDesign points:")
X_opt = design_matrix(optimal_design)
M_opt = X_opt.T @ X_opt
print(np.array2string(optimal_design, precision=3, suppress_small=True))
# --- D-Efficiency ---
def d_efficiency(design, reference_det, p=6):
"""Compute D-efficiency relative to reference."""
X = design_matrix(design)
det_current = np.linalg.slogdet(X.T @ X)[1]
return np.exp((det_current - reference_det) / p)
print(f"\nD-efficiency: {d_efficiency(optimal_design, d_crit):.4f}")
# --- Prediction Variance Profile ---
x1_grid = np.linspace(-1, 1, 50)
x2_grid = np.linspace(-1, 1, 50)
X1, X2 = np.meshgrid(x1_grid, x2_grid)
spv = np.zeros_like(X1)
M_inv = np.linalg.inv(M_opt)
for i in range(50):
for j in range(50):
x_test = np.array([1, X1[i,j], X2[i,j], X1[i,j]**2, X2[i,j]**2, X1[i,j]*X2[i,j]])
spv[i, j] = n_runs * x_test @ M_inv @ x_test
print(f"\n=== Prediction Variance ===")
print(f"Min SPV: {spv.min():.2f}")
print(f"Max SPV: {spv.max():.2f}")
print(f"Mean SPV: {spv.mean():.2f}")
print(f"Theoretical max (G-opt): p = 6")
# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Design points
axes[0].scatter(optimal_design[:, 0], optimal_design[:, 1], s=100, c='red',
edgecolors='black', zorder=5, label='Design points')
axes[0].scatter(candidate_set[:, 0], candidate_set[:, 1], s=10, c='gray', alpha=0.3, label='Candidates')
axes[0].set_xlabel('xβ')
axes[0].set_ylabel('xβ')
axes[0].set_title(f'D-Optimal Design (n={n_runs}, log det = {d_crit:.2f})')
axes[0].legend()
axes[0].set_xlim([-1.1, 1.1])
axes[0].set_ylim([-1.1, 1.1])
axes[0].grid(True, alpha=0.3)
# Prediction variance profile
im = axes[1].contourf(X1, X2, spv, levels=20, cmap='viridis')
plt.colorbar(im, ax=axes[1], label='SPV(x)')
axes[1].scatter(optimal_design[:, 0], optimal_design[:, 1], s=60, c='red',
edgecolors='white', zorder=5)
axes[1].set_xlabel('xβ')
axes[1].set_ylabel('xβ')
axes[1].set_title('Scaled Prediction Variance Profile')
plt.tight_layout()
plt.savefig("optimal_design.png", dpi=150, bbox_inches="tight")
plt.show()
# --- Comparison: Random vs Optimal ---
n_trials = 100
random_dets = []
for _ in range(n_trials):
idx = np.random.choice(len(candidate_set), n_runs, replace=False)
X_rand = design_matrix(candidate_set[idx])
det_val = np.linalg.slogdet(X_rand.T @ X_rand)[1]
random_dets.append(det_val)
print(f"\n=== Design Comparison ===")
print(f"D-optimal log det: {d_crit:.4f}")
print(f"Random designs: mean={np.mean(random_dets):.4f}, std={np.std(random_dets):.4f}")
print(f"D-efficiency of random vs optimal: {np.exp((np.mean(random_dets) - d_crit)/6)*100:.1f}%")
Summary
Key Takeaways: Optimal Experimental Design
- Information matrix determines parameter precision; . Optimal designs maximize according to a scalar criterion.
- D-optimal maximizes (confidence ellipsoid volume); A-optimal minimizes (average variance); E-optimal maximizes (worst-case variance). D-optimality is the most commonly used criterion.
- Classical results: D-optimal designs for polynomial regression use Chebyshev points; factorials are D-optimal for linear models. The Kiefer-Wolfowitz equivalence theorem connects D- and G-optimality.
- Coordinate exchange iteratively optimizes each design point; multiple restarts are essential to avoid local optima. Fedorov's algorithm and metaheuristics (tabu, simulated annealing) offer alternatives.
- Diagnostics β D-efficiency, prediction variance profiles, and alias structure validate design quality. Sequential and Bayesian designs adapt to accumulating data and prior information, essential for expensive experiments.