Design of Experiments (DOE)
Advanced Statistical Methods
Extracting Maximum Information From Minimum Trials
Design of experiments structures trials to efficiently estimate factor effects and interactions while minimizing resource use. Factorial and fractional factorial designs reveal which factors truly matter.
- Pharmaceutical development β Optimize drug formulations by testing multiple factors simultaneously
- Agriculture β Compare crop varieties and fertilizer combinations with controlled field trials
- Semiconductor manufacturing β Identify critical process parameters affecting chip yield
Good experimental design asks the right questions with the fewest possible experiments.
Design of Experiments (DOE) is a systematic, rigorous methodology for planning controlled experiments that enable efficient estimation of factor effects and their interactions. Unlike observational studies, DOE allows researchers to establish causal relationships by systematically varying experimental factors while controlling extraneous sources of variation. The mathematical foundations of DOE draw from linear algebra, combinatorics, and the theory of orthogonal arrays, providing a framework for optimal information extraction with minimal experimental effort.
Fundamental Principles
DfExperimental Design
An experimental design is a specification of the experimental conditions (runs) to be conducted, defined by:
- Factors (): Controllable input variables
- Levels: Discrete values each factor can take
- Response (): Measured output variable
- Randomization: Random assignment of experimental conditions to experimental units
- Blocking: Grouping homogeneous experimental units to reduce nuisance variation
- Replication: Repeated observations under identical conditions
ThFisher's Principles of Experimental Design
R.A. Fisher established three fundamental principles:
- Randomization: Random allocation of treatments to experimental units eliminates systematic bias and provides a valid basis for statistical inference
- Replication: Independent repetitions under each treatment condition enable estimation of experimental error
- Local Control (Blocking): Grouping similar experimental units reduces uncontrolled variation, increasing precision of treatment comparisons
Factorial Designs
Factorial designs are the cornerstone of DOE, simultaneously investigating multiple factors and their interactions.
DfFactorial Design Structure
A full factorial design at 2 levels examines all possible combinations of factor levels. For factors each at 2 levels ( and ), the design requires experimental runs. The model for a factorial is:
where represents the coded level of factor in run .
Effect Estimation in $2^k$ Factorials
The main effect of factor is the average change in response when moves from to , averaged over all other factors:
where is the sign () of factor in run . Similarly, the interaction effect measures how the effect of depends on the level of :
Two-Factor Factorial Design
Problem: Investigate effects of temperature () and catalyst concentration () on chemical yield. Each factor has 2 levels: Temperature: 150Β°C () and 200Β°C (); Concentration: 2% () and 5% ().
Design Matrix and Results:
| Run | A | B | AB | Yield (%) |
|---|---|---|---|---|
| 1 | 72 | |||
| 2 | 78 | |||
| 3 | 76 | |||
| 4 | 90 |
Effect Calculations:
The interaction effect indicates synergy: the temperature effect is larger at high concentration.
Confounding and Resolution
When experimental resources are limited, fractional factorial designs confound (alias) some effects, assuming higher-order interactions are negligible.
DfConfounding and Alias Structure
In a fractional factorial design (with generators), the alias structure is determined by the defining relation where are generator columns. Two effects and are aliased if , meaning they cannot be estimated separately from the same experiment.
DfDesign Resolution
The resolution of a fractional factorial design is the minimum word length in the defining relation. Properties by resolution:
- Resolution III: No main effects aliased with each other, but main effects aliased with 2-factor interactions
- Resolution IV: No main effects or 2-factor interactions aliased with each other; 2-factor interactions aliased with each other
- Resolution V: No main effects or 2-factor interactions aliased with each other or with each other; 3-factor interactions aliased with 2-factor interactions
Taguchi Orthogonal Arrays
Taguchi methods provide practical fractional factorial designs using orthogonal arrays where is the number of runs, is the number of levels, and is the number of factors. Common arrays include , , , and . These designs balance estimation efficiency with practical constraints.
Advanced Design Structures
Box-Behnken Designs
Box-Behnken designs are three-level designs that do not contain extreme points (corner points of the cube), making them suitable for second-order response surface modeling.
DfBox-Behnken Design
A Box-Behnken design for factors requires runs where is the number of center points. The design consists of:
- Points at the midpoints of edges of the factorial design
- Center points for curvature estimation
For factors, the design requires 15 runs (12 edge midpoints + 3 center points).
Central Composite Design (CCD)
DfCentral Composite Design
A CCD for factors consists of three distinct sets of points:
- Factorial points: points (or a fraction) at levels
- Axial (star) points: points at and permutations
- Center points: replicates at the origin
The total number of runs:
The parameter determines design properties:
- Rotatable: ensures constant prediction variance at equal distances from center
- Face-centered: places axial points on factor faces
Second-Order Response Surface Model
The quadratic model fitted from CCD or Box-Behnken designs:
In matrix notation: where includes linear, quadratic, and cross-product terms.
Python Implementation
import numpy as np
import pandas as pd
from itertools import product
from scipy import stats
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Generate 2^k factorial design
def factorial_design(k, factors=None):
"""Generate a full 2^k factorial design."""
if factors is None:
factors = [f'X{i+1}' for i in range(k)]
# Generate all combinations of -1 and +1
levels = list(product([-1, 1], repeat=k))
design = pd.DataFrame(levels, columns=factors)
# Add interaction columns
for i in range(k):
for j in range(i+1, k):
col_name = f'{factors[i]}*{factors[j]}'
design[col_name] = design[factors[i]] * design[factors[j]]
return design
# Effect estimation
def estimate_effects(design, response):
"""Estimate main effects and interactions."""
effects = {}
k = len([col for col in design.columns if '*' not in col])
# Main effects
for j in range(k):
factor = design.columns[j]
effects[factor] = np.mean(response[design[factor] == 1]) - \
np.mean(response[design[factor] == -1])
# Two-factor interactions
for i in range(k):
for j in range(i+1, k):
f1, f2 = design.columns[i], design.columns[j]
interaction = design[f1] * design[f2]
effects[f'{f1}*{f2}'] = np.mean(response[interaction == 1]) - \
np.mean(response[interaction == -1])
return effects
# Generate 2^3 factorial design
design_2k3 = factorial_design(3, ['Temperature', 'Pressure', 'Catalyst'])
print("2^3 Factorial Design:")
print(design_2k3.head(8))
# Simulated yield data
np.random.seed(42)
n_replicates = 3
design_expanded = pd.DataFrame(np.repeat(design_2k3.values, n_replicates, axis=0),
columns=design_2k3.columns)
# True effects
true_effects = {'Temperature': 8.0, 'Pressure': 5.0, 'Catalyst': 3.0,
'Temperature*Pressure': -2.0, 'Temperature*Catalyst': 1.5,
'Pressure*Catalyst': 0.8}
# Generate response with effects and noise
yield_data = []
for _, row in design_expanded.iterrows():
y = 50 # Grand mean
y += (row['Temperature'] * true_effects['Temperature'] / 2)
y += (row['Pressure'] * true_effects['Pressure'] / 2)
y += (row['Catalyst'] * true_effects['Catalyst'] / 2)
y += (row['Temperature'] * row['Pressure'] * true_effects['Temperature*Pressure'] / 2)
y += (row['Temperature'] * row['Catalyst'] * true_effects['Temperature*Catalyst'] / 2)
y += (row['Pressure'] * row['Catalyst'] * true_effects['Pressure*Catalyst'] / 2)
y += np.random.normal(0, 1.5)
yield_data.append(y)
design_expanded['Yield'] = yield_data
# Estimate effects
effects = estimate_effects(design_2k3,
[design_expanded['Yield'].iloc[i:i+n_replicates].mean()
for i in range(0, len(design_expanded), n_replicates)])
print("\nEstimated Effects:")
for effect, value in effects.items():
print(f" {effect}: {value:.3f}")
# Fractional factorial 2^(3-1)
def fractional_factorial_2k1():
"""Generate 2^(3-1) fractional factorial design."""
# Generator: C = AB
design = pd.DataFrame({
'A': [-1, -1, 1, 1],
'B': [-1, 1, -1, 1],
'C': [1, -1, -1, 1] # C = A*B
})
return design
design_frac = fractional_factorial_2k1()
print("\n2^(3-1) Fractional Factorial:")
print(design_frac)
print("Alias structure: A aliased with BC, B aliased with AC, C aliased with AB")
# Box-Behnken design for 3 factors
def box_behnken_3factor():
"""Generate Box-Behnken design for 3 factors."""
# Edge midpoints of the cube
runs = [
[-1, -1, 0], [-1, 1, 0], [1, -1, 0], [1, 1, 0], # AB face
[-1, 0, -1], [-1, 0, 1], [1, 0, -1], [1, 0, 1], # AC face
[0, -1, -1], [0, -1, 1], [0, 1, -1], [0, 1, 1], # BC face
[0, 0, 0], [0, 0, 0], [0, 0, 0] # Center points
]
return pd.DataFrame(runs, columns=['X1', 'X2', 'X3'])
bb_design = box_behnken_3factor()
print("\nBox-Behnken Design (3 factors):")
print(bb_design)
# Central Composite Design for 2 factors
def central_composite_2factor(alpha=None):
"""Generate CCD for 2 factors."""
if alpha is None:
alpha = np.sqrt(2) # Rotatable
factorial = [[-1, -1], [-1, 1], [1, -1], [1, 1]]
axial = [[-alpha, 0], [alpha, 0], [0, -alpha], [0, alpha]]
center = [[0, 0], [0, 0]]
design = pd.DataFrame(factorial + axial + center,
columns=['X1', 'X2'])
return design
ccd_design = central_composite_2factor()
print("\nCentral Composite Design (2 factors, rotatable):")
print(ccd_design)
# Visualize CCD
fig, ax = plt.subplots(figsize=(8, 8))
factorial_pts = ccd_design.iloc[:4]
ax.scatter(factorial_pts['X1'], factorial_pts['X2'], s=100,
c='blue', label='Factorial', zorder=3)
ax.scatter(ccd_design.iloc[4:8]['X1'], ccd_design.iloc[4:8]['X2'], s=100,
c='red', label='Axial', zorder=3)
ax.scatter(ccd_design.iloc[8:]['X1'], ccd_design.iloc[8:]['X2'], s=100,
c='green', label='Center', zorder=3)
ax.axhline(0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_title('Central Composite Design')
ax.legend()
ax.set_aspect('equal')
plt.grid(True, alpha=0.3)
plt.savefig('ccd_design.png', dpi=150)
plt.show()
Key Design Properties
Summary: Design of Experiments (DOE)
- Factorial Efficiency: factorial designs estimate all effects (including interactions) using runs, achieving maximum information per run
- Aliasing Principle: In fractional factorials , effects share each alias group; resolution determines the severity of confounding
- Resolution Hierarchy: Resolution III Resolution IV Resolution V in terms of estimability; higher resolution is always preferable but requires more runs
- Design Selection: CCD for rotatable second-order designs; Box-Behnken for designs avoiding extreme points; both support quadratic response surface modeling
- Core Trade-off: Information gained versus resources expended; fractional factorials sacrifice effect estimability for reduced experimental cost, relying on sparsity of effects principle