Survey Sampling and Weighting
Advanced Statistical Methods
Getting Population Answers From Sample Data
Survey sampling uses design-based inference to generalize from samples to populations, with weighting procedures like raking and the Horvitz-Thompson estimator correcting for unequal selection probabilities.
- Public opinion polling β Produce nationally representative estimates from stratified samples
- Census operations β Adjust for non-response and undercoverage in population counts
- Health surveys β Estimate disease prevalence from complex multi-stage sampling designs
Survey weighting ensures every voice counts proportionally, not just the ones that were easy to reach.
Survey statistics provides the mathematical framework for drawing inferences about populations from samples selected through known, probabilistic mechanisms. Unlike model-based inference, design-based inference treats population values as fixed and randomness arises solely from the sampling process. This lesson develops the theory and practice of survey sampling, weighting, and variance estimation for complex survey designs.
Design-Based Inference
DfFinite Population Framework
Let be a finite population of size . Each unit has an unknown value for the variable of interest. A sampling design assigns a probability to every possible sample such that . The design determines which samples are possible and their selection probabilities.
DfInclusion Probability
The first-order inclusion probability is the probability that unit is included in the sample:
The second-order inclusion probability is the probability that both units and are included:
A sampling design is first-order balanced if for all .
Horvitz-Thompson Estimator
Horvitz-Thompson Estimator
The Horvitz-Thompson (HT) estimator of the population total is:
where is the design weight. The HT estimator is unbiased for any sampling design with :
ThHorvitz-Thompson Unbiasedness
Proof: Let be the inclusion indicator ( if , 0 otherwise). Then and:
HT Variance
The variance of the HT estimator is:
For stratified sampling with independent strata:
where is the population variance in stratum .
HT Estimator Properties
- Unbiased for any , regardless of the population distribution
- Design-consistent: as under appropriate conditions
- Sensitivity: Large weights (small ) lead to high variance; extreme weights can dominate the estimate
- Non-negativity: The HT estimator can produce negative estimates for totals when and weights are large
Design Effects
DfDesign Effect (DEFF)
The design effect compares the variance of an estimator under the complex design to its variance under simple random sampling (SRS) of the same size:
- : Complex design is less efficient than SRS
- : Complex design is more efficient (e.g., optimal allocation)
- : Equivalent to SRS
Effective Sample Size
The effective sample size adjusts the actual sample size for design effects:
For a proportion with design weight :
A common approximation for stratified sampling with optimal allocation is where .
Sources of Design Effects
- Clustering: Respondents within clusters are correlated; where is cluster size and is the intra-class correlation
- Stratification: Reduces variance when strata are homogeneous; when strata differ in means
- Unequal selection probabilities: Increase variance unless compensated by weighting
- Post-stratification: Can reduce variance if auxiliary information is correlated with outcomes
Weighting Procedures
Survey weights adjust for unequal selection probabilities and non-response. The final analysis weight is typically a product of several components.
Design Weight
The base weight (design weight) is the inverse of the inclusion probability:
For a multi-stage design with stages, the design weight is the product of conditional selection probabilities at each stage.
Non-Response Adjustment
After accounting for design, weights are adjusted for non-response. The non-response adjustment divides the weight by the estimated response probability:
where is the estimated probability of response given covariates . This is estimated via logistic regression on design strata and auxiliary variables. Non-response adjustment cells group units with similar response propensities.
Post-Stratification
Post-stratification calibrates weights so that weighted totals of known population counts match census or administrative data:
where is the known population count in post-stratum and is the sample in that post-stratum. This adjusts for coverage and non-response bias when the post-stratification variables are correlated with outcomes.
Raking (Iterative Proportional Fitting)
DfRaking
Raking (iterative proportional fitting, IPF) calibrates weights to match known marginal distributions for multiple categorical variables simultaneously. Given classification variables with known population margins :
where is the category of unit on variable . Raking iterates through each margin, adjusting weights proportionally until convergence.
Raking Convergence
Raking converges when:
- The number of cells in the cross-classification exceeds the sample size (so exact calibration to all cells is impossible)
- The marginal totals are consistent (sum to the same population total across all variables)
- Each unit belongs to a unique cell in the cross-classification
Convergence is guaranteed when the log-linear model implied by the margins is faithful. In practice, convergence typically occurs in 10β30 iterations. Weights can become extreme when sample sizes in marginal cells are small.
Calibration Estimator
Raking is a special case of calibration estimation. Given auxiliary variables with known population total , the calibrated weight minimizes:
where is a distance function. Common choices:
- Raking: (Kullback-Leibler divergence)
- Linear calibration: (chi-square distance)
- Logit trimming: Bounds weights to and re-calibrates
Variance Estimation for Complex Surveys
Complex survey designs violate the independence assumption underlying simple variance formulas. Specialized variance estimation methods account for the design.
Taylor Series Linearization
The Taylor series method approximates the variance of a nonlinear estimator by linearizing around the population value:
where is the gradient and is the linearized variable. For a weighted mean :
Jackknife Variance Estimation
The jackknife estimates variance by systematically deleting portions of the sample. For a stratified design with PSUs per stratum:
- Delete PSU from stratum : compute estimate using remaining PSUs
- Create pseudo-replicate:
The jackknife is model-free and handles any statistic (medians, quantiles, regression coefficients).
Bootstrap Variance Estimation
The bootstrap for complex surveys resamples PSUs within strata:
- For replicate : draw PSUs with replacement from stratum
- Compute from the bootstrap sample
- Estimate variance:
The bootstrap handles unequal probabilities by weighting resampled units by where is the resampling probability.
Variance Estimation Comparison
| Method | Pros | Cons | Best For |
|---|---|---|---|
| Taylor series | Fast, analytic | Linearization error | Smooth estimators |
| Jackknife | Exact, handles any statistic | Needs PSUs | Medians, quantiles |
| Bootstrap | Flexible, handles complex designs | Computationally expensive | Small samples, extreme statistics |
Python Implementation
import numpy as np
import pandas as pd
from scipy import stats
np.random.seed(42)
# --- Simulate Complex Survey Data ---
N = 10000 # population size
n_strata = 4 # number of strata
n_psu_per_stratum = 25 # PSUs per stratum
# Population
strata = np.random.choice(n_strata, N, p=[0.3, 0.3, 0.25, 0.15])
y = 50 + 10 * strata + np.random.randn(N) * 15 # outcome correlated with stratum
x = 2 * strata + np.random.randn(N) * 5 # auxiliary variable
# Inverse probability sampling (higher prob in larger strata)
probs = np.where(strata < 2, 0.1, 0.2) # oversample small strata
sample_idx = np.random.choice(N, size=1000, replace=False, p=probs / probs.sum())
design_weights = 1.0 / probs[sample_idx]
# --- Horvitz-Thompson Estimator ---
y_sample = y[sample_idx]
w = design_weights
y_ht = np.sum(w * y_sample) / np.sum(w)
y_srs = np.mean(y_sample)
print("=== Horvitz-Thompson Estimator ===")
print(f"Population mean (true): {np.mean(y):.2f}")
print(f"HT estimate: {y_ht:.2f}")
print(f"SRS mean: {y_srs:.2f}")
# --- Design Effect ---
var_srs = np.var(y_sample, ddof=1) / len(y_sample)
# Approximate variance with Taylor linearization
z_i = w * (y_sample - y_ht) / np.sum(w)
var_ht = np.sum(z_i**2) * len(y_sample) / (len(y_sample) - 1)
deff = var_ht / var_srs
n_eff = len(y_sample) / deff
print(f"\n=== Design Effects ===")
print(f"DEFF: {deff:.2f}")
print(f"Effective sample size: {n_eff:.0f}")
# --- Raking / Iterative Proportional Fitting ---
def rake_weights(weights, sample_data, population_margins, max_iter=50, tol=1e-6):
"""Rake weights to match population margins for multiple variables."""
w = weights.copy()
for iteration in range(max_iter):
w_old = w.copy()
for var_name, margin in population_margins.items():
categories = sample_data[var_name]
unique_cats = np.unique(categories)
for cat in unique_cats:
mask = categories == cat
w[mask] *= margin[cat] / np.sum(w[mask])
if np.max(np.abs(w - w_old)) < tol:
print(f" Raking converged after {iteration + 1} iterations")
break
return w
# Post-stratification variables
age_group = np.random.choice(['18-34', '35-54', '55+'], len(sample_idx), p=[0.3, 0.4, 0.3])
gender = np.random.choice(['M', 'F'], len(sample_idx), p=[0.48, 0.52])
# Known population margins
pop_margins = {
'age': {'18-34': 0.28, '35-54': 0.38, '55+': 0.34},
'gender': {'M': 0.49, 'F': 0.51}
}
sample_df = pd.DataFrame({'age': age_group, 'gender': gender})
w_raked = rake_weights(w, sample_df, pop_margins)
y_raked = np.sum(w_raked * y_sample) / np.sum(w_raked)
print(f"\n=== Raked Estimate ===")
print(f"Raked mean: {y_raked:.2f}")
print(f"Weight range: [{w_raked.min():.2f}, {w_raked.max():.2f}]")
# --- Jackknife Variance Estimation ---
# Simulate PSU structure
n_strata_sample = 4
psu_per_stratum = 25
total_sample = n_strata_sample * psu_per_stratum
# Create fake PSU assignments
psu_ids = np.repeat(np.arange(total_sample), len(y_sample) // total_sample + 1)[:len(y_sample)]
stratum_ids = np.repeat(np.arange(n_strata_sample), psu_per_stratum)[:len(y_sample)]
def theta_hat(y_vals, w_vals):
return np.sum(w_vals * y_vals) / np.sum(w_vals)
# Delete-one-PSU jackknife
estimates = []
for h in range(n_strata_sample):
for j in range(psu_per_stratum):
mask = ~((stratum_ids == h) & (psu_ids == j))
if mask.sum() > 10:
theta_jk = theta_hat(y_sample[mask], w[mask])
estimates.append(theta_jk)
estimates = np.array(estimates)
var_jack = np.var(estimates, ddof=1)
se_jack = np.sqrt(var_jack)
print(f"\n=== Jackknife Variance ===")
print(f"Estimate: {y_ht:.2f}")
print(f"SE (jackknife): {se_jack:.2f}")
print(f"95% CI: [{y_ht - 1.96*se_jack:.2f}, {y_ht + 1.96*se_jack:.2f}]")
Summary
Key Takeaways: Survey Sampling and Weighting
- Design-based inference treats population values as fixed; randomness comes from the sampling mechanism. The Horvitz-Thompson estimator is unbiased for any design with known inclusion probabilities .
- Design effects (DEFF) quantify the efficiency loss from complex designs relative to SRS. Clustering inflates DEFF by factor ; stratification reduces it. The effective sample size .
- Weighting proceeds in stages: design weights (), non-response adjustment (divide by estimated response probability), and post-stratification (calibrate to known margins). Raking (IPF) iteratively adjusts to multiple marginal distributions.
- Calibration estimation provides a unified framework: minimize distance to base weights subject to margin-matching constraints. Raking uses Kullback-Leibler; linear calibration uses chi-square distance.
- Variance estimation β Taylor series linearization is fast for smooth estimators; jackknife and bootstrap handle arbitrary statistics (medians, quantiles). For complex surveys, always use design-based variance estimates, not naive formulas.