πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Survey Sampling and Weighting

Advanced Statistical MethodsSurvey Methods🟒 Free Lesson

Advertisement

Survey Sampling and Weighting

Advanced Statistical Methods

Getting Population Answers From Sample Data

Survey sampling uses design-based inference to generalize from samples to populations, with weighting procedures like raking and the Horvitz-Thompson estimator correcting for unequal selection probabilities.

  • Public opinion polling β€” Produce nationally representative estimates from stratified samples
  • Census operations β€” Adjust for non-response and undercoverage in population counts
  • Health surveys β€” Estimate disease prevalence from complex multi-stage sampling designs

Survey weighting ensures every voice counts proportionally, not just the ones that were easy to reach.


Survey statistics provides the mathematical framework for drawing inferences about populations from samples selected through known, probabilistic mechanisms. Unlike model-based inference, design-based inference treats population values as fixed and randomness arises solely from the sampling process. This lesson develops the theory and practice of survey sampling, weighting, and variance estimation for complex survey designs.


Design-Based Inference

DfFinite Population Framework

Let U={1,2,…,N}U = \{1, 2, \dots, N\} be a finite population of size NN. Each unit ii has an unknown value yiy_i for the variable of interest. A sampling design p(s)p(s) assigns a probability to every possible sample sβŠ†Us \subseteq U such that βˆ‘sβŠ†Up(s)=1\sum_{s \subseteq U} p(s) = 1. The design determines which samples are possible and their selection probabilities.

DfInclusion Probability

The first-order inclusion probability Ο€i\pi_i is the probability that unit ii is included in the sample:

Ο€i=P(i∈s)=βˆ‘sβˆ‹ip(s)\pi_i = P(i \in s) = \sum_{s \ni i} p(s)

The second-order inclusion probability Ο€ij\pi_{ij} is the probability that both units ii and jj are included:

Ο€ij=P(i∈sΒ andΒ j∈s)=βˆ‘sβˆ‹i,jp(s)\pi_{ij} = P(i \in s \text{ and } j \in s) = \sum_{s \ni i, j} p(s)

A sampling design is first-order balanced if Ο€i>0\pi_i > 0 for all ii.


Horvitz-Thompson Estimator

Horvitz-Thompson Estimator

The Horvitz-Thompson (HT) estimator of the population total Y=βˆ‘i∈UyiY = \sum_{i \in U} y_i is:

Y^HT=βˆ‘i∈syiΟ€i=βˆ‘i∈swiyi\hat{Y}_{\text{HT}} = \sum_{i \in s} \frac{y_i}{\pi_i} = \sum_{i \in s} w_i y_i

where wi=1/Ο€iw_i = 1/\pi_i is the design weight. The HT estimator is unbiased for any sampling design with Ο€i>0\pi_i > 0:

Ep[Y^HT]=βˆ‘i∈Uyi=Y\mathbb{E}_p[\hat{Y}_{\text{HT}}] = \sum_{i \in U} y_i = Y

ThHorvitz-Thompson Unbiasedness

Proof: Let IiI_i be the inclusion indicator (Ii=1I_i = 1 if i∈si \in s, 0 otherwise). Then E[Ii]=Ο€i\mathbb{E}[I_i] = \pi_i and:

E[βˆ‘i∈UIiyiΟ€i]=βˆ‘i∈UyiΟ€iβ‹…Ο€i=βˆ‘i∈Uyi=Yβ–‘\mathbb{E}\left[\sum_{i \in U} \frac{I_i y_i}{\pi_i}\right] = \sum_{i \in U} \frac{y_i}{\pi_i} \cdot \pi_i = \sum_{i \in U} y_i = Y \quad \square

HT Variance

The variance of the HT estimator is:

V(Y^HT)=βˆ‘i∈Uβˆ‘j∈U(Ο€ijβˆ’Ο€iΟ€j)Ο€iΟ€jyiyjV(\hat{Y}_{\text{HT}}) = \sum_{i \in U} \sum_{j \in U} \frac{(\pi_{ij} - \pi_i \pi_j)}{\pi_i \pi_j} y_i y_j

For stratified sampling with independent strata:

V(Y^HT)=βˆ‘h=1HVh(Y^h)=βˆ‘h=1HNh2(1βˆ’nhNh)Sh2nhV(\hat{Y}_{\text{HT}}) = \sum_{h=1}^{H} V_h\left(\hat{Y}_h\right) = \sum_{h=1}^{H} N_h^2 \left(1 - \frac{n_h}{N_h}\right) \frac{S_h^2}{n_h}

where Sh2=1Nhβˆ’1βˆ‘i∈Uh(yiβˆ’YΛ‰h)2S_h^2 = \frac{1}{N_h - 1}\sum_{i \in U_h}(y_i - \bar{Y}_h)^2 is the population variance in stratum hh.

HT Estimator Properties

  • Unbiased for any Ο€i>0\pi_i > 0, regardless of the population distribution
  • Design-consistent: V(Y^HT)/Y2β†’0V(\hat{Y}_{\text{HT}})/Y^2 \to 0 as nβ†’βˆžn \to \infty under appropriate conditions
  • Sensitivity: Large weights (small Ο€i\pi_i) lead to high variance; extreme weights can dominate the estimate
  • Non-negativity: The HT estimator can produce negative estimates for totals when yi<0y_i < 0 and weights are large

Design Effects

DfDesign Effect (DEFF)

The design effect compares the variance of an estimator under the complex design to its variance under simple random sampling (SRS) of the same size:

DEFF(ΞΈ^)=Vcomplex(ΞΈ^)VSRS(ΞΈ^)\text{DEFF}(\hat{\theta}) = \frac{V_{\text{complex}}(\hat{\theta})}{V_{\text{SRS}}(\hat{\theta})}
  • DEFF>1\text{DEFF} > 1: Complex design is less efficient than SRS
  • DEFF<1\text{DEFF} < 1: Complex design is more efficient (e.g., optimal allocation)
  • DEFF=1\text{DEFF} = 1: Equivalent to SRS

Effective Sample Size

The effective sample size adjusts the actual sample size for design effects:

neff=nDEFFn_{\text{eff}} = \frac{n}{\text{DEFF}}

For a proportion p^\hat{p} with design weight wiw_i:

DEFF=1+CV2(w)⋅wˉ⋅(something specific to the design)\text{DEFF} = 1 + \text{CV}^2(w) \cdot \bar{w} \cdot (\text{something specific to the design})

A common approximation for stratified sampling with optimal allocation is DEFFβ‰ˆ1βˆ’βˆ‘hWh2\text{DEFF} \approx 1 - \sum_h W_h^2 where Wh=Nh/NW_h = N_h/N.

Sources of Design Effects

  1. Clustering: Respondents within clusters are correlated; DEFFclusterβ‰ˆ1+(mβˆ’1)ρ\text{DEFF}_{\text{cluster}} \approx 1 + (m-1)\rho where mm is cluster size and ρ\rho is the intra-class correlation
  2. Stratification: Reduces variance when strata are homogeneous; DEFFstrat<1\text{DEFF}_{\text{strat}} < 1 when strata differ in means
  3. Unequal selection probabilities: Increase variance unless compensated by weighting
  4. Post-stratification: Can reduce variance if auxiliary information is correlated with outcomes

Weighting Procedures

Survey weights adjust for unequal selection probabilities and non-response. The final analysis weight is typically a product of several components.

Design Weight

The base weight (design weight) is the inverse of the inclusion probability:

widesign=1Ο€iw_i^{\text{design}} = \frac{1}{\pi_i}

For a multi-stage design with kk stages, the design weight is the product of conditional selection probabilities at each stage.

Non-Response Adjustment

After accounting for design, weights are adjusted for non-response. The non-response adjustment divides the weight by the estimated response probability:

wiNR=widesignP^(Ri=1∣xi)w_i^{\text{NR}} = \frac{w_i^{\text{design}}}{\hat{P}(R_i = 1 \mid \mathbf{x}_i)}

where P^(Ri=1∣xi)\hat{P}(R_i = 1 \mid \mathbf{x}_i) is the estimated probability of response given covariates xi\mathbf{x}_i. This is estimated via logistic regression on design strata and auxiliary variables. Non-response adjustment cells group units with similar response propensities.

Post-Stratification

Post-stratification calibrates weights so that weighted totals of known population counts match census or administrative data:

wiPS=wiNRβ‹…Ncβˆ‘j∈scwjNRw_i^{\text{PS}} = w_i^{\text{NR}} \cdot \frac{N_c}{\sum_{j \in s_c} w_j^{\text{NR}}}

where NcN_c is the known population count in post-stratum cc and scs_c is the sample in that post-stratum. This adjusts for coverage and non-response bias when the post-stratification variables are correlated with outcomes.


Raking (Iterative Proportional Fitting)

DfRaking

Raking (iterative proportional fitting, IPF) calibrates weights to match known marginal distributions for multiple categorical variables simultaneously. Given KK classification variables with known population margins {Nk(1),…,Nk(K)}\{N_k^{(1)}, \dots, N_k^{(K)}\}:

wi(t+1)=wi(t)β‹…βˆk=1KNk(cik)βˆ‘j∈swj(t)β‹…1(cjk=cik)w_i^{(t+1)} = w_i^{(t)} \cdot \prod_{k=1}^{K} \frac{N_k^{(c_{ik})}}{\sum_{j \in s} w_j^{(t)} \cdot \mathbb{1}(c_{jk} = c_{ik})}

where cikc_{ik} is the category of unit ii on variable kk. Raking iterates through each margin, adjusting weights proportionally until convergence.

Raking Convergence

Raking converges when:

  • The number of cells in the cross-classification exceeds the sample size (so exact calibration to all cells is impossible)
  • The marginal totals are consistent (sum to the same population total across all variables)
  • Each unit belongs to a unique cell in the cross-classification

Convergence is guaranteed when the log-linear model implied by the margins is faithful. In practice, convergence typically occurs in 10–30 iterations. Weights can become extreme when sample sizes in marginal cells are small.

Calibration Estimator

Raking is a special case of calibration estimation. Given auxiliary variables xi\mathbf{x}_i with known population total X=βˆ‘i∈Uxi\mathbf{X} = \sum_{i \in U} \mathbf{x}_i, the calibrated weight minimizes:

min⁑wβˆ‘i∈sg(wi)subjectΒ toβˆ‘i∈swixi=X\min_{\mathbf{w}} \sum_{i \in s} g(w_i) \quad \text{subject to} \quad \sum_{i \in s} w_i \mathbf{x}_i = \mathbf{X}

where g(w)g(w) is a distance function. Common choices:

  • Raking: g(w)=wlog⁑wg(w) = w \log w (Kullback-Leibler divergence)
  • Linear calibration: g(w)=(wβˆ’wold)2g(w) = (w - w^{\text{old}})^2 (chi-square distance)
  • Logit trimming: Bounds weights to [a,b][a, b] and re-calibrates

Variance Estimation for Complex Surveys

Complex survey designs violate the independence assumption underlying simple variance formulas. Specialized variance estimation methods account for the design.

Taylor Series Linearization

The Taylor series method approximates the variance of a nonlinear estimator ΞΈ^=f(Y^)\hat{\theta} = f(\hat{\mathbf{Y}}) by linearizing around the population value:

V(ΞΈ^)β‰ˆβˆ‘h=1Hβˆ‘i=1nhβˆ‘j=1nhΞ”hΞ”hβ€²β‹…(zhiβˆ’zΛ‰h)(zhjβˆ’zΛ‰h)β€²V(\hat{\theta}) \approx \sum_{h=1}^{H} \sum_{i=1}^{n_h} \sum_{j=1}^{n_h} \Delta_h \Delta_h' \cdot (z_{hi} - \bar{z}_h)(z_{hj} - \bar{z}_h)'

where Ξ”h=βˆ‚fβˆ‚Y^h∣Y^h=Yh\Delta_h = \frac{\partial f}{\partial \hat{\mathbf{Y}}_h}\bigg|_{\hat{\mathbf{Y}}_h = \mathbf{Y}_h} is the gradient and zhiz_{hi} is the linearized variable. For a weighted mean ΞΈ^=βˆ‘wiyi/βˆ‘wi\hat{\theta} = \sum w_i y_i / \sum w_i:

zi=wi(yiβˆ’ΞΈ^)/βˆ‘jwjz_i = w_i(y_i - \hat{\theta}) / \sum_j w_j

Jackknife Variance Estimation

The jackknife estimates variance by systematically deleting portions of the sample. For a stratified design with nhn_h PSUs per stratum:

  1. Delete PSU jj from stratum hh: compute estimate ΞΈ^h(βˆ’j)\hat{\theta}_{h(-j)} using remaining PSUs
  2. Create pseudo-replicate: ΞΈ^h(βˆ’j)βˆ—=nhΞΈ^βˆ’(nhβˆ’1)ΞΈ^h(βˆ’j)\hat{\theta}_{h(-j)}^{*} = n_h \hat{\theta} - (n_h - 1)\hat{\theta}_{h(-j)}
VJK(ΞΈ^)=1H(nhβˆ’1)βˆ‘h=1Hβˆ‘j=1nh(ΞΈ^h(βˆ’j)βˆ—βˆ’ΞΈ^)2V_{\text{JK}}(\hat{\theta}) = \frac{1}{H(n_h - 1)} \sum_{h=1}^{H} \sum_{j=1}^{n_h} (\hat{\theta}_{h(-j)}^{*} - \hat{\theta})^2

The jackknife is model-free and handles any statistic (medians, quantiles, regression coefficients).

Bootstrap Variance Estimation

The bootstrap for complex surveys resamples PSUs within strata:

  1. For replicate b=1,…,Bb = 1, \dots, B: draw nhn_h PSUs with replacement from stratum hh
  2. Compute ΞΈ^βˆ—(b)\hat{\theta}^{*(b)} from the bootstrap sample
  3. Estimate variance:
Vboot(ΞΈ^)=1Bβˆ’1βˆ‘b=1B(ΞΈ^βˆ—(b)βˆ’ΞΈΛ‰βˆ—)2V_{\text{boot}}(\hat{\theta}) = \frac{1}{B-1} \sum_{b=1}^{B} (\hat{\theta}^{*(b)} - \bar{\theta}^*)^2

The bootstrap handles unequal probabilities by weighting resampled units by 1/piβˆ—1/p_i^{*} where piβˆ—p_i^{*} is the resampling probability.

Variance Estimation Comparison

MethodProsConsBest For
Taylor seriesFast, analyticLinearization errorSmooth estimators
JackknifeExact, handles any statisticNeeds PSUsMedians, quantiles
BootstrapFlexible, handles complex designsComputationally expensiveSmall samples, extreme statistics

Python Implementation

import numpy as np
import pandas as pd
from scipy import stats

np.random.seed(42)

# --- Simulate Complex Survey Data ---
N = 10000  # population size
n_strata = 4  # number of strata
n_psu_per_stratum = 25  # PSUs per stratum

# Population
strata = np.random.choice(n_strata, N, p=[0.3, 0.3, 0.25, 0.15])
y = 50 + 10 * strata + np.random.randn(N) * 15  # outcome correlated with stratum
x = 2 * strata + np.random.randn(N) * 5  # auxiliary variable

# Inverse probability sampling (higher prob in larger strata)
probs = np.where(strata < 2, 0.1, 0.2)  # oversample small strata
sample_idx = np.random.choice(N, size=1000, replace=False, p=probs / probs.sum())
design_weights = 1.0 / probs[sample_idx]

# --- Horvitz-Thompson Estimator ---
y_sample = y[sample_idx]
w = design_weights

y_ht = np.sum(w * y_sample) / np.sum(w)
y_srs = np.mean(y_sample)

print("=== Horvitz-Thompson Estimator ===")
print(f"Population mean (true): {np.mean(y):.2f}")
print(f"HT estimate: {y_ht:.2f}")
print(f"SRS mean: {y_srs:.2f}")

# --- Design Effect ---
var_srs = np.var(y_sample, ddof=1) / len(y_sample)
# Approximate variance with Taylor linearization
z_i = w * (y_sample - y_ht) / np.sum(w)
var_ht = np.sum(z_i**2) * len(y_sample) / (len(y_sample) - 1)
deff = var_ht / var_srs
n_eff = len(y_sample) / deff

print(f"\n=== Design Effects ===")
print(f"DEFF: {deff:.2f}")
print(f"Effective sample size: {n_eff:.0f}")

# --- Raking / Iterative Proportional Fitting ---
def rake_weights(weights, sample_data, population_margins, max_iter=50, tol=1e-6):
    """Rake weights to match population margins for multiple variables."""
    w = weights.copy()
    for iteration in range(max_iter):
        w_old = w.copy()
        for var_name, margin in population_margins.items():
            categories = sample_data[var_name]
            unique_cats = np.unique(categories)
            for cat in unique_cats:
                mask = categories == cat
                w[mask] *= margin[cat] / np.sum(w[mask])
        if np.max(np.abs(w - w_old)) < tol:
            print(f"  Raking converged after {iteration + 1} iterations")
            break
    return w

# Post-stratification variables
age_group = np.random.choice(['18-34', '35-54', '55+'], len(sample_idx), p=[0.3, 0.4, 0.3])
gender = np.random.choice(['M', 'F'], len(sample_idx), p=[0.48, 0.52])

# Known population margins
pop_margins = {
    'age': {'18-34': 0.28, '35-54': 0.38, '55+': 0.34},
    'gender': {'M': 0.49, 'F': 0.51}
}

sample_df = pd.DataFrame({'age': age_group, 'gender': gender})
w_raked = rake_weights(w, sample_df, pop_margins)
y_raked = np.sum(w_raked * y_sample) / np.sum(w_raked)

print(f"\n=== Raked Estimate ===")
print(f"Raked mean: {y_raked:.2f}")
print(f"Weight range: [{w_raked.min():.2f}, {w_raked.max():.2f}]")

# --- Jackknife Variance Estimation ---
# Simulate PSU structure
n_strata_sample = 4
psu_per_stratum = 25
total_sample = n_strata_sample * psu_per_stratum

# Create fake PSU assignments
psu_ids = np.repeat(np.arange(total_sample), len(y_sample) // total_sample + 1)[:len(y_sample)]
stratum_ids = np.repeat(np.arange(n_strata_sample), psu_per_stratum)[:len(y_sample)]

def theta_hat(y_vals, w_vals):
    return np.sum(w_vals * y_vals) / np.sum(w_vals)

# Delete-one-PSU jackknife
estimates = []
for h in range(n_strata_sample):
    for j in range(psu_per_stratum):
        mask = ~((stratum_ids == h) & (psu_ids == j))
        if mask.sum() > 10:
            theta_jk = theta_hat(y_sample[mask], w[mask])
            estimates.append(theta_jk)

estimates = np.array(estimates)
var_jack = np.var(estimates, ddof=1)
se_jack = np.sqrt(var_jack)

print(f"\n=== Jackknife Variance ===")
print(f"Estimate: {y_ht:.2f}")
print(f"SE (jackknife): {se_jack:.2f}")
print(f"95% CI: [{y_ht - 1.96*se_jack:.2f}, {y_ht + 1.96*se_jack:.2f}]")

Summary

Key Takeaways: Survey Sampling and Weighting

  1. Design-based inference treats population values as fixed; randomness comes from the sampling mechanism. The Horvitz-Thompson estimator Y^HT=βˆ‘i∈syi/Ο€i\hat{Y}_{\text{HT}} = \sum_{i \in s} y_i/\pi_i is unbiased for any design with known inclusion probabilities Ο€i>0\pi_i > 0.
  2. Design effects (DEFF) quantify the efficiency loss from complex designs relative to SRS. Clustering inflates DEFF by factor 1+(mβˆ’1)ρ1 + (m-1)\rho; stratification reduces it. The effective sample size neff=n/DEFFn_{\text{eff}} = n/\text{DEFF}.
  3. Weighting proceeds in stages: design weights (1/Ο€i1/\pi_i), non-response adjustment (divide by estimated response probability), and post-stratification (calibrate to known margins). Raking (IPF) iteratively adjusts to multiple marginal distributions.
  4. Calibration estimation provides a unified framework: minimize distance to base weights subject to margin-matching constraints. Raking uses Kullback-Leibler; linear calibration uses chi-square distance.
  5. Variance estimation β€” Taylor series linearization is fast for smooth estimators; jackknife and bootstrap handle arbitrary statistics (medians, quantiles). For complex surveys, always use design-based variance estimates, not naive formulas.
⭐

Premium Content

Survey Sampling and Weighting

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement