πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design of Experiments (DOE)

Advanced Statistical MethodsExperimental Design🟒 Free Lesson

Advertisement

Design of Experiments (DOE)

Advanced Statistical Methods

Extracting Maximum Information From Minimum Trials

Design of experiments structures trials to efficiently estimate factor effects and interactions while minimizing resource use. Factorial and fractional factorial designs reveal which factors truly matter.

  • Pharmaceutical development β€” Optimize drug formulations by testing multiple factors simultaneously
  • Agriculture β€” Compare crop varieties and fertilizer combinations with controlled field trials
  • Semiconductor manufacturing β€” Identify critical process parameters affecting chip yield

Good experimental design asks the right questions with the fewest possible experiments.


Design of Experiments (DOE) is a systematic, rigorous methodology for planning controlled experiments that enable efficient estimation of factor effects and their interactions. Unlike observational studies, DOE allows researchers to establish causal relationships by systematically varying experimental factors while controlling extraneous sources of variation. The mathematical foundations of DOE draw from linear algebra, combinatorics, and the theory of orthogonal arrays, providing a framework for optimal information extraction with minimal experimental effort.

Fundamental Principles

DfExperimental Design

An experimental design is a specification of the experimental conditions (runs) to be conducted, defined by:

  • Factors (X1,X2,…,XkX_1, X_2, \ldots, X_k): Controllable input variables
  • Levels: Discrete values each factor can take
  • Response (YY): Measured output variable
  • Randomization: Random assignment of experimental conditions to experimental units
  • Blocking: Grouping homogeneous experimental units to reduce nuisance variation
  • Replication: Repeated observations under identical conditions

ThFisher's Principles of Experimental Design

R.A. Fisher established three fundamental principles:

  1. Randomization: Random allocation of treatments to experimental units eliminates systematic bias and provides a valid basis for statistical inference
  2. Replication: Independent repetitions under each treatment condition enable estimation of experimental error
  3. Local Control (Blocking): Grouping similar experimental units reduces uncontrolled variation, increasing precision of treatment comparisons

Factorial Designs

Factorial designs are the cornerstone of DOE, simultaneously investigating multiple factors and their interactions.

DfFactorial Design Structure

A full factorial design at 2 levels examines all possible combinations of factor levels. For kk factors each at 2 levels (βˆ’1-1 and +1+1), the design requires 2k2^k experimental runs. The model for a 2k2^k factorial is:

Yi=Ξ²0+βˆ‘j=1kΞ²jXij+βˆ‘j<lΞ²jlXijXil+β‹―+Ξ²12…kXi1Xi2β‹―Xik+Ξ΅iY_{i} = \beta_0 + \sum_{j=1}^{k} \beta_j X_{ij} + \sum_{j<l} \beta_{jl} X_{ij} X_{il} + \cdots + \beta_{12\ldots k} X_{i1} X_{i2} \cdots X_{ik} + \varepsilon_i

where Xij∈{βˆ’1,+1}X_{ij} \in \{-1, +1\} represents the coded level of factor jj in run ii.

Effect Estimation in $2^k$ Factorials

The main effect of factor AA is the average change in response when AA moves from βˆ’1-1 to +1+1, averaged over all other factors:

EA=YΛ‰A+βˆ’YΛ‰Aβˆ’=1nβ‹…2kβˆ’1βˆ‘iYiβ‹…AiE_A = \bar{Y}_{A+} - \bar{Y}_{A-} = \frac{1}{n \cdot 2^{k-1}} \sum_{i} Y_i \cdot A_i

where AiA_i is the sign (Β±1\pm 1) of factor AA in run ii. Similarly, the interaction effect ABAB measures how the effect of AA depends on the level of BB:

EAB=1nβ‹…2kβˆ’1βˆ‘iYiβ‹…Aiβ‹…BiE_{AB} = \frac{1}{n \cdot 2^{k-1}} \sum_{i} Y_i \cdot A_i \cdot B_i

Two-Factor Factorial Design

Problem: Investigate effects of temperature (AA) and catalyst concentration (BB) on chemical yield. Each factor has 2 levels: Temperature: 150Β°C (βˆ’1-1) and 200Β°C (+1+1); Concentration: 2% (βˆ’1-1) and 5% (+1+1).

Design Matrix and Results:

RunABABYield (%)
1βˆ’1-1βˆ’1-1+1+172
2+1+1βˆ’1-1βˆ’1-178
3βˆ’1-1+1+1βˆ’1-176
4+1+1+1+1+1+190

Effect Calculations:

EA=(78+90)βˆ’(72+76)2=168βˆ’1482=10.0E_A = \frac{(78 + 90) - (72 + 76)}{2} = \frac{168 - 148}{2} = 10.0
EB=(76+90)βˆ’(72+78)2=166βˆ’1502=8.0E_B = \frac{(76 + 90) - (72 + 78)}{2} = \frac{166 - 150}{2} = 8.0
EAB=(72+90)βˆ’(78+76)2=162βˆ’1542=4.0E_{AB} = \frac{(72 + 90) - (78 + 76)}{2} = \frac{162 - 154}{2} = 4.0

The interaction effect indicates synergy: the temperature effect is larger at high concentration.

Confounding and Resolution

When experimental resources are limited, fractional factorial designs confound (alias) some effects, assuming higher-order interactions are negligible.

DfConfounding and Alias Structure

In a 2kβˆ’p2^{k-p} fractional factorial design (with pp generators), the alias structure is determined by the defining relation I=G1G2β‹―GpI = G_1 G_2 \cdots G_p where GiG_i are generator columns. Two effects E1E_1 and E2E_2 are aliased if E1Γ—E2=IE_1 \times E_2 = I, meaning they cannot be estimated separately from the same experiment.

DfDesign Resolution

The resolution RR of a fractional factorial design is the minimum word length in the defining relation. Properties by resolution:

  • Resolution III: No main effects aliased with each other, but main effects aliased with 2-factor interactions
  • Resolution IV: No main effects or 2-factor interactions aliased with each other; 2-factor interactions aliased with each other
  • Resolution V: No main effects or 2-factor interactions aliased with each other or with each other; 3-factor interactions aliased with 2-factor interactions

Taguchi Orthogonal Arrays

Taguchi methods provide practical fractional factorial designs using orthogonal arrays LN(sk)L_N(s^k) where NN is the number of runs, ss is the number of levels, and kk is the number of factors. Common arrays include L4(23)L_4(2^3), L8(27)L_8(2^7), L9(34)L_9(3^4), and L16(215)L_{16}(2^{15}). These designs balance estimation efficiency with practical constraints.

Advanced Design Structures

Box-Behnken Designs

Box-Behnken designs are three-level designs that do not contain extreme points (corner points of the cube), making them suitable for second-order response surface modeling.

DfBox-Behnken Design

A Box-Behnken design for kk factors requires N=2k(kβˆ’1)+c0N = 2k(k-1) + c_0 runs where c0c_0 is the number of center points. The design consists of:

  • Points at the midpoints of edges of the factorial design
  • Center points for curvature estimation

For k=3k = 3 factors, the design requires 15 runs (12 edge midpoints + 3 center points).

Central Composite Design (CCD)

DfCentral Composite Design

A CCD for kk factors consists of three distinct sets of points:

  1. Factorial points: 2k2^k points (or a fraction) at Β±1\pm 1 levels
  2. Axial (star) points: 2k2k points at (Β±Ξ±,0,…,0)(\pm \alpha, 0, \ldots, 0) and permutations
  3. Center points: c0c_0 replicates at the origin (0,0,…,0)(0, 0, \ldots, 0)

The total number of runs: N=2k+2k+c0N = 2^k + 2k + c_0

The parameter Ξ±\alpha determines design properties:

  • Rotatable: Ξ±=(2k)1/4\alpha = (2^k)^{1/4} ensures constant prediction variance at equal distances from center
  • Face-centered: Ξ±=1\alpha = 1 places axial points on factor faces

Second-Order Response Surface Model

The quadratic model fitted from CCD or Box-Behnken designs:

Y=Ξ²0+βˆ‘j=1kΞ²jXj+βˆ‘j=1kΞ²jjXj2+βˆ‘j<lΞ²jlXjXl+Ξ΅Y = \beta_0 + \sum_{j=1}^{k} \beta_j X_j + \sum_{j=1}^{k} \beta_{jj} X_j^2 + \sum_{j<l} \beta_{jl} X_j X_l + \varepsilon

In matrix notation: Y=XΞ²+Ξ΅\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} where X\mathbf{X} includes linear, quadratic, and cross-product terms.

Python Implementation

import numpy as np
import pandas as pd
from itertools import product
from scipy import stats
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate 2^k factorial design
def factorial_design(k, factors=None):
    """Generate a full 2^k factorial design."""
    if factors is None:
        factors = [f'X{i+1}' for i in range(k)]
    
    # Generate all combinations of -1 and +1
    levels = list(product([-1, 1], repeat=k))
    design = pd.DataFrame(levels, columns=factors)
    
    # Add interaction columns
    for i in range(k):
        for j in range(i+1, k):
            col_name = f'{factors[i]}*{factors[j]}'
            design[col_name] = design[factors[i]] * design[factors[j]]
    
    return design

# Effect estimation
def estimate_effects(design, response):
    """Estimate main effects and interactions."""
    effects = {}
    k = len([col for col in design.columns if '*' not in col])
    
    # Main effects
    for j in range(k):
        factor = design.columns[j]
        effects[factor] = np.mean(response[design[factor] == 1]) - \
                         np.mean(response[design[factor] == -1])
    
    # Two-factor interactions
    for i in range(k):
        for j in range(i+1, k):
            f1, f2 = design.columns[i], design.columns[j]
            interaction = design[f1] * design[f2]
            effects[f'{f1}*{f2}'] = np.mean(response[interaction == 1]) - \
                                   np.mean(response[interaction == -1])
    
    return effects

# Generate 2^3 factorial design
design_2k3 = factorial_design(3, ['Temperature', 'Pressure', 'Catalyst'])
print("2^3 Factorial Design:")
print(design_2k3.head(8))

# Simulated yield data
np.random.seed(42)
n_replicates = 3
design_expanded = pd.DataFrame(np.repeat(design_2k3.values, n_replicates, axis=0),
                               columns=design_2k3.columns)

# True effects
true_effects = {'Temperature': 8.0, 'Pressure': 5.0, 'Catalyst': 3.0,
                'Temperature*Pressure': -2.0, 'Temperature*Catalyst': 1.5,
                'Pressure*Catalyst': 0.8}

# Generate response with effects and noise
yield_data = []
for _, row in design_expanded.iterrows():
    y = 50  # Grand mean
    y += (row['Temperature'] * true_effects['Temperature'] / 2)
    y += (row['Pressure'] * true_effects['Pressure'] / 2)
    y += (row['Catalyst'] * true_effects['Catalyst'] / 2)
    y += (row['Temperature'] * row['Pressure'] * true_effects['Temperature*Pressure'] / 2)
    y += (row['Temperature'] * row['Catalyst'] * true_effects['Temperature*Catalyst'] / 2)
    y += (row['Pressure'] * row['Catalyst'] * true_effects['Pressure*Catalyst'] / 2)
    y += np.random.normal(0, 1.5)
    yield_data.append(y)

design_expanded['Yield'] = yield_data

# Estimate effects
effects = estimate_effects(design_2k3, 
                          [design_expanded['Yield'].iloc[i:i+n_replicates].mean() 
                           for i in range(0, len(design_expanded), n_replicates)])

print("\nEstimated Effects:")
for effect, value in effects.items():
    print(f"  {effect}: {value:.3f}")

# Fractional factorial 2^(3-1)
def fractional_factorial_2k1():
    """Generate 2^(3-1) fractional factorial design."""
    # Generator: C = AB
    design = pd.DataFrame({
        'A': [-1, -1, 1, 1],
        'B': [-1, 1, -1, 1],
        'C': [1, -1, -1, 1]  # C = A*B
    })
    return design

design_frac = fractional_factorial_2k1()
print("\n2^(3-1) Fractional Factorial:")
print(design_frac)
print("Alias structure: A aliased with BC, B aliased with AC, C aliased with AB")

# Box-Behnken design for 3 factors
def box_behnken_3factor():
    """Generate Box-Behnken design for 3 factors."""
    # Edge midpoints of the cube
    runs = [
        [-1, -1, 0], [-1, 1, 0], [1, -1, 0], [1, 1, 0],  # AB face
        [-1, 0, -1], [-1, 0, 1], [1, 0, -1], [1, 0, 1],  # AC face
        [0, -1, -1], [0, -1, 1], [0, 1, -1], [0, 1, 1],  # BC face
        [0, 0, 0], [0, 0, 0], [0, 0, 0]  # Center points
    ]
    return pd.DataFrame(runs, columns=['X1', 'X2', 'X3'])

bb_design = box_behnken_3factor()
print("\nBox-Behnken Design (3 factors):")
print(bb_design)

# Central Composite Design for 2 factors
def central_composite_2factor(alpha=None):
    """Generate CCD for 2 factors."""
    if alpha is None:
        alpha = np.sqrt(2)  # Rotatable
    
    factorial = [[-1, -1], [-1, 1], [1, -1], [1, 1]]
    axial = [[-alpha, 0], [alpha, 0], [0, -alpha], [0, alpha]]
    center = [[0, 0], [0, 0]]
    
    design = pd.DataFrame(factorial + axial + center, 
                          columns=['X1', 'X2'])
    return design

ccd_design = central_composite_2factor()
print("\nCentral Composite Design (2 factors, rotatable):")
print(ccd_design)

# Visualize CCD
fig, ax = plt.subplots(figsize=(8, 8))
factorial_pts = ccd_design.iloc[:4]
ax.scatter(factorial_pts['X1'], factorial_pts['X2'], s=100, 
          c='blue', label='Factorial', zorder=3)
ax.scatter(ccd_design.iloc[4:8]['X1'], ccd_design.iloc[4:8]['X2'], s=100, 
          c='red', label='Axial', zorder=3)
ax.scatter(ccd_design.iloc[8:]['X1'], ccd_design.iloc[8:]['X2'], s=100, 
          c='green', label='Center', zorder=3)
ax.axhline(0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_title('Central Composite Design')
ax.legend()
ax.set_aspect('equal')
plt.grid(True, alpha=0.3)
plt.savefig('ccd_design.png', dpi=150)
plt.show()

Key Design Properties

Summary: Design of Experiments (DOE)

  1. Factorial Efficiency: 2k2^k factorial designs estimate all 2k2^k effects (including interactions) using 2k2^k runs, achieving maximum information per run
  2. Aliasing Principle: In fractional factorials 2kβˆ’p2^{k-p}, 2p2^p effects share each alias group; resolution determines the severity of confounding
  3. Resolution Hierarchy: Resolution III βŠ‚\subset Resolution IV βŠ‚\subset Resolution V in terms of estimability; higher resolution is always preferable but requires more runs
  4. Design Selection: CCD for rotatable second-order designs; Box-Behnken for designs avoiding extreme points; both support quadratic response surface modeling
  5. Core Trade-off: Information gained versus resources expended; fractional factorials sacrifice effect estimability for reduced experimental cost, relying on sparsity of effects principle
⭐

Premium Content

Design of Experiments (DOE)

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement