Copulas — Modeling Dependence
Advanced Statistical Methods
Separating Marginal Behavior From Joint Dependence
Copulas model the dependence structure between variables independently of their marginal distributions, thanks to Sklar's theorem. Gaussian, t, and Archimedean copulas capture different tail dependence patterns.
- Finance — Model joint extreme losses across assets for portfolio risk management
- Insurance — Correlate claim amounts across multiple coverage lines for solvency modeling
- Hydrology — Link rainfall and river flow distributions for flood risk assessment
Copulas let you model how variables move together without being constrained by their individual distributions.
DfCopula
A copula is a multivariate cumulative distribution function with uniform marginal distributions on [0,1]. Formally, a d-dimensional copula C: [0,1]^d → [0,1] satisfies:
- Uniform margins: C(1,...,1,u_i,1,...,1) = u_i for all i and all u_i ∈ [0,1]
- Grounded: C(u₁,...,u_d) = 0 if any u_i = 0
- d-increasing: For any a ≤ b in [0,1]^d, the C-volume of the box [a,b] is non-negative:
where the sum is over all vertices v of the box, and |v| is the number of components where v_j = b_j.
Sklar's Theorem
Sklar's theorem (1959): For any d-dimensional joint distribution function H with marginal distributions F₁, F₂, ..., F_d, there exists a copula C such that:
If the marginals are continuous, the copula C is unique. Conversely, given any copula C and continuous marginals F₁, ..., F_d, the function H defined above is a joint distribution with the specified margins.
Implication: Copulas separate the modeling of marginal distributions from the modeling of dependence structure. This decomposition is fundamental for:
- Modeling margins with different distributions (e.g., normal, t, gamma)
- Changing dependence structure independently of margins
- Constructing flexible multivariate models
Density form: If H and the margins have densities h and f_i:
where c(u₁,...,u_d) = ∂^d C/∂u₁...∂u_d is the copula density and u_i = F_i(x_i).
Gaussian Copula
DfGaussian Copula
The Gaussian copula is constructed from the multivariate normal distribution. Given a correlation matrix R ∈ ℝ^{d×d}:
where Φ_R is the standard multivariate normal CDF with correlation matrix R, and Φ⁻¹ is the standard normal quantile function.
Density:
where z_i = Φ⁻¹(u_i).
Properties:
- Symmetric tail dependence: λ_L = λ_U = 0 (no asymptotic tail dependence)
- Linear correlation: Pearson's ρ equals the copula parameter R_{ij}
- Rotational symmetry: C(u₁, u₂) = C(1-u₁, 1-u₂) for d = 2
- Comonotonicity: As R_{ij} → 1, C → M (comonotonic copula)
Gaussian Copula Limitations
The Gaussian copula's lack of tail dependence makes it inadequate for modeling extreme co-movements:
- Financial crisis: The Gaussian copula underestimates the probability of simultaneous extreme losses
- Credit risk: The "Godzilla" mispricing of CDOs during 2008 was partly due to Gaussian copula assumptions
- Insurance: Catastrophic events (earthquakes, hurricanes) exhibit stronger tail dependence than Gaussian allows
Symmetric tail dependence: For the Gaussian copula with correlation ρ:
which is always zero asymptotically, regardless of ρ.
Student's t-Copula
Dft-Copula
The Student's t-copula is constructed from the multivariate t-distribution with ν degrees of freedom and correlation matrix R:
where t_{R,ν} is the standard multivariate t-CDF and t_ν⁻¹ is the univariate t quantile function.
Density:
where t_i = t_ν⁻¹(u_i) and f_{R,ν} is the multivariate t density.
Tail dependence: The t-copula exhibits symmetric tail dependence:
For finite ν, λ_U > 0, capturing extreme co-movements. As ν → ∞, the t-copula converges to the Gaussian copula.
t-Copula Tail Dependence Coefficients
The tail dependence coefficients for the bivariate t-copula with correlation ρ and ν degrees of freedom:
Key values:
- ν = 1 (Cauchy): λ = 2·arcsin(√((1-ρ)/2))/π → maximum tail dependence
- ν = 5: λ_U ranges from 0.16 (ρ=0.5) to 0.46 (ρ=0.9)
- ν = 30: λ_U ranges from 0.01 (ρ=0.5) to 0.12 (ρ=0.9)
Parameter estimation: The t-copula parameters (R, ν) can be estimated via:
- Inversion of Kendall's tau: R_{ij} = sin(πτ_{ij}/2)
- Maximum likelihood on pseudo-observations
- Minimum distance estimation
The degrees of freedom ν controls tail dependence: smaller ν → heavier tails → stronger tail dependence.
Archimedean Copulas
ThArchimedean Copula Construction
A d-dimensional Archimedean copula is constructed from a generator function ψ: [0,∞] → [0,1] that is continuous, decreasing, convex, with ψ(0) = 1 and ψ(∞) = 0:
where ψ⁻¹ is the (pseudo-)inverse of ψ. The generator must satisfy the completely monotone condition for d ≥ 3:
Major families:
| Family | Generator ψ(t) | Parameter θ | Domain |
|---|---|---|---|
| Clayton | (1 + θt)^{-1/θ} | θ > 0 | (0, ∞) |
| Gumbel | (-log t)^θ | θ ≥ 1 | [1, ∞) |
| Frank | -log((e^{-θt}-1)/(e^{-θ}-1)) | θ ≠ 0 | ℝ |
| Joe | 1 - (1 - e^{-t})^{1/θ} | θ ≥ 1 | [1, ∞) |
Clayton copula: C(u,v) = (u^{-θ} + v^{-θ} - 1)^{-1/θ}, strong lower tail dependence Gumbel copula: C(u,v) = exp(-((-log u)^θ + (-log v)^θ)^{1/θ}), strong upper tail dependence
Archimedean Dependence Measures
For bivariate Archimedean copulas, the dependence measures have closed-form expressions:
Kendall's tau:
| Family | Kendall's τ |
|---|---|
| Clayton | θ/(θ+2) |
| Gumbel | 1 - 1/θ |
| Frank | 1 - 4/θ + 4/θ² ∫₀^θ t/(e^t-1) dt |
Spearman's rho:
Tail dependence:
- Clayton: λ_L = 2^{-1/θ}, λ_U = 0
- Gumbel: λ_L = 0, λ_U = 2 - 2^{1/θ}
- Frank: λ_L = λ_U = 0
Dependence Measures
DfKendall's Tau and Spearman's Rho
Kendall's tau measures concordance. For continuous random variables (X₁, Y₁) and (X₂, Y₂) i.i.d.:
For a copula C:
Spearman's rho is the Pearson correlation of transformed variables:
Properties:
- Both are invariant under monotone transformations
- τ ∈ [-1, 1], ρ_S ∈ [-1, 1]
- τ = 1 iff comonotonic; τ = -1 iff countermonotonic
- For Gaussian copula: τ = (2/π)arcsin(ρ/2), ρ_S = (6/π)arcsin(ρ/2)
Sample estimators:
where c is the number of concordant pairs, d the discordant pairs, and d_i = rank(x_i) - rank(y_i).
Vine Copulas
DfVine Copula (R-Vine)
A vine copula decomposes a d-dimensional copula into a cascade of bivariate copulas using a graphical model (vine) structure. The regular vine (R-vine) is a nested set of trees T₁, T₂, ..., T_{d-1} satisfying:
- T₁ has nodes {1,...,d} and edges E₁
- T_{k+1} has nodes E_k and edges E_{k+1}
- Proximity condition: If two edges in T_{k+1} are joined, they share a node in T_k
Canonical vine (C-vine): Each tree has a central node (root), suitable when one variable is dominant.
Drawable vine (D-vine): Each tree is a path, suitable for time series or sequential data.
The density factorizes as:
where a(e) and b(e) are the conditioned variables and D(e) is the conditioning set.
Vine Copula Construction and Selection
Step 1: Pair-copula selection For each edge in the vine, select from a library of bivariate copulas (Gaussian, t, Clayton, Gumbel, Frank, Joe, BB1, BB7) using AIC/BIC.
Step 2: Structure selection The number of possible R-vine structures grows super-exponentially:
- d = 4: 240 structures
- d = 5: 5,040 structures
- d = 10: ~10¹⁰ structures
The Sequential Jump (SJ) algorithm or Maximum Spanning Tree (based on Kendall's tau) provide heuristic structure selection.
Step 3: Parameter estimation Given the vine structure, pair-copula parameters are estimated by:
- Canonical maximum likelihood (CML): Transform data to pseudo-observations using empirical margins
- Inference functions for margins (IFM): Two-stage: estimate margins, then copulas
- Maximum likelihood: Joint estimation of margins and copulas
Step 4: Goodness-of-fit The Rosenblatt transform maps data to i.i.d. uniform under the model, enabling tests:
QQ-plots, Cramér-von Mises tests, and Kolkoski-Drobison tests assess fit.
import numpy as np
from scipy import stats
from scipy.special import gamma
class CopulaModel:
def __init__(self, data):
self.data = np.asarray(data)
self.n, self.d = self.data.shape
def empirical_cdf(self):
ranks = np.apply_along_axis(stats.rankdata, 0, self.data)
return ranks / (self.n + 1)
def gaussian_copula_fit(self):
U = self.empirical_cdf()
Z = stats.norm.ppf(np.clip(U, 1e-10, 1-1e-10))
R = np.corrcoef(Z.T)
return R
def t_copula_fit(self):
U = self.empirical_cdf()
Z = stats.t.ppf(np.clip(U, 1e-10, 1-1e-10), df=5)
R = np.corrcoef(Z.T)
return R, 5.0
def clayton_copula(self, u, v, theta):
return (u**(-theta) + v**(-theta) - 1)**(-1/theta)
def gumbel_copula(self, u, v, theta):
return np.exp(-((-np.log(u))**theta + (-np.log(v))**theta)**(1/theta))
def frank_copula(self, u, v, theta):
num = np.exp(-theta*u) - 1
den = np.exp(-theta) - 1
return -np.log(1 + (num * (np.exp(-theta*v) - 1)) / (den * np.exp(-theta*(u+v)/2))) / theta
def kendalls_tau(self, u, v):
n = len(u)
concordant = 0
discordant = 0
for i in range(n):
for j in range(i+1, n):
d = (u[i] - u[j]) * (v[i] - v[j])
if d > 0:
concordant += 1
elif d < 0:
discordant += 1
return (concordant - discordant) / (concordant + discordant)
def tail_dependence_coefficient(self, u, v, threshold=0.95):
n = len(u)
upper = np.sum((u > threshold) & (v > threshold)) / (n * (1 - threshold))
lower = np.sum((u < 1-threshold) & (v < 1-threshold)) / (n * (1 - threshold))
return upper, lower
def simulate_gaussian(self, R, n_samples):
d = R.shape[0]
L = np.linalg.cholesky(R)
Z = np.random.standard_normal((n_samples, d))
X = Z @ L.T
U = stats.norm.cdf(X)
return U
Applications in Finance
Portfolio Risk Management
Copulas enable realistic modeling of joint tail behavior in portfolios:
Credit portfolio: Model default correlations using a t-copula with:
- Marginals: Bernoulli (default/no default) or continuous (credit scores)
- ν = 4-6: Captures clustering of defaults during crises
- Stress testing: Adjust ν downward for extreme scenarios
Currency risk: Model joint exchange rate movements:
- Marginals: GARCH(1,1) for volatility clustering
- D-vine copula for serial dependence in time series
- C-vine for a dominant currency (e.g., USD)
Insurance: Model catastrophic losses:
- Gumbel copula for upper tail dependence (joint large losses)
- Frank copula for symmetric dependence
- Vine copulas for high-dimensional reinsurance portfolios
CDO pricing: The Gaussian copula underestimates tranche spreads for equity tranches. The t-copula with ν ≈ 3-5 better matches market spreads:
Copula Family Comparison
| Copula | Tail Dependence | Kendall's τ Range | Typical Use |
|---|---|---|---|
| Gaussian | λ_U = λ_L = 0 | [-1, 1] | Benchmark |
| t (ν < ∞) | λ_U = λ_L > 0 | [-1, 1] | Financial risk |
| Clayton | λ_L > 0, λ_U = 0 | [0, 1) | Insurance, floods |
| Gumbel | λ_U > 0, λ_L = 0 | [0, 1) | Extreme events |
| Frank | λ_U = λ_L = 0 | [-1, 1] | Symmetric dependence |
| BB1 | Both > 0 | [0, 1) | Flexible tails |
| BB7 | Both > 0 | [0, 1) | Most flexible |
Model selection guidelines:
- Examine scatter plots and QQ-plots for asymmetry and tail behavior
- Compute Kendall's tau, Spearman's rho, and tail dependence estimates
- Fit candidate copulas and compare AIC/BIC
- Validate with Rosenblatt transform and goodness-of-fit tests
- For d > 3, consider vine copulas for flexible high-dimensional dependence
Non-Stationary Copulas for Time Series
For time-varying dependence, copula parameters can depend on covariates:
Time-varying correlation:
ensuring ρ_t ∈ (-1, 1) for any covariate z_t.
Regime-switching copulas:
where π_k(z_t) are state probabilities (e.g., from a Markov switching model) and C_k are different copula families.
Stochastic copula:
enables smooth parameter evolution without explicit covariates.