Introduction

Advanced Statistical Methods

Visualizing Similarity as Spatial Maps

Multidimensional scaling transforms proximity data into a geometric configuration where distances reflect similarities. Both metric and non-metric MDS preserve relational structure in low-dimensional maps.

Marketing research — Visualize how consumers perceive brand similarities in perceptual maps
Psychology — Map cognitive distances between stimuli in multidimensional space
Bioinformatics — Visualize genetic distances between populations in two or three dimensions

MDS turns abstract similarity judgments into intuitive spatial representations.

Multidimensional Scaling (MDS) is a family of techniques that transform a matrix of pairwise distances or dissimilarities into a configuration of points in a low-dimensional space. The goal is to preserve the original distance structure as faithfully as possible, yielding a spatial representation that reveals hidden patterns in proximity data.

MDS is particularly valuable when the underlying variables are latent or when direct measurement is infeasible — for example, when subjects can judge the similarity between stimuli but cannot articulate the dimensions on which they differ.

Distance and Dissimilarity

Mathematical Preliminaries

Let $\boldsymbol{\Delta} = [\delta_{ij}]$ be an $n \times n$ matrix of observed dissimilarities between $n$ objects. MDS seeks points $\mathbf{x}_1, \dots, \mathbf{x}_n \in \mathbb{R}^m$ (with $m \ll n$ ) such that the Euclidean distances $d_{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$ approximate $\delta_{ij}$ .

DfDissimilarity Matrix Properties

A dissimilarity matrix $\boldsymbol{\Delta}$ is:

Symmetric: $\delta_{ij} = \delta_{ji}$ for all $i, j$
Zero diagonal: $\delta_{ii} = 0$ for all $i$
Non-negative: $\delta_{ij} \geq 0$
Metric (if applicable): satisfies the triangle inequality $\delta_{ik} \leq \delta_{ij} + \delta_{jk}$

Classical MDS (Principal Coordinates Analysis)

Eigendecomposition Method

Classical MDS (also called Principal Coordinates Analysis, Torgerson, 1952) provides an exact solution when dissimilarities are Euclidean distances:

\mathbf{B} = -\frac{1}{2}\mathbf{J}\boldsymbol{\Delta}^{(2)}\mathbf{J}

where $\boldsymbol{\Delta}^{(2)} = [\delta_{ij}^2]$ , $\mathbf{J} = \mathbf{I}_n - \frac{1}{n}\mathbf{1}\mathbf{1}^{\top}$ is the centering matrix, and $\mathbf{B}$ is the doubly centered matrix of squared distances.

ThClassical MDS Solution

If $\boldsymbol{\Delta}$ contains Euclidean distances, then $\mathbf{B}$ is positive semi-definite and equals $\mathbf{X}\mathbf{X}^{\top}$ where $\mathbf{X}$ is the $n \times m$ configuration matrix. The $m$ -dimensional MDS solution is:

\mathbf{X}_m = \mathbf{V}_m \boldsymbol{\Lambda}_m^{1/2}

where $\mathbf{V}_m$ contains the $m$ leading eigenvectors of $\mathbf{B}$ and $\boldsymbol{\Lambda}_m = \text{diag}(\lambda_1, \dots, \lambda_m)$ the corresponding eigenvalues.

The quality of representation is assessed by the proportion of variance explained:

\text{Proportion of variance} = \frac{\sum_{k=1}^{m} \lambda_k}{\sum_{k=1}^{n-1} \lambda_k^+}

where $\lambda_k^+$ are the positive eigenvalues of $\mathbf{B}$ .

Torgerson's Double Centering

Classical MDS can be expressed as a Gram matrix decomposition:

\mathbf{G} = \mathbf{X}\mathbf{X}^{\top} = -\frac{1}{2}\mathbf{J}\boldsymbol{\Delta}^{(2)}\mathbf{J} = \mathbf{V}\boldsymbol{\Lambda}\mathbf{V}^{\top}

The eigendecomposition of $\mathbf{G}$ yields the configuration $\mathbf{X} = \mathbf{V}_m\boldsymbol{\Lambda}_m^{1/2}$ , where each row $\mathbf{x}_i$ is the coordinate vector for object $i$ .

Classical MDS preserves metric distance relationships. When the original distances are not Euclidean (e.g., city-block, path distances), the centered matrix $\mathbf{B}$ may have negative eigenvalues. In such cases, the negative eigenvalues are typically set to zero, and the remaining positive eigenvalues define the best Euclidean approximation.

Metric MDS

Stress Function

When the dissimilarities are not Euclidean, or when a general monotonic relationship is desired, metric MDS minimizes a stress function:

DfMetric Stress (Kruskal's Stress-1)

Kruskal's Stress-1 for $m$ -dimensional MDS is:

\text{Stress}_m = \sqrt{\frac{\sum_{i<j} (d_{ij} - \hat{\delta}_{ij})^2}{\sum_{i<j} d_{ij}^2}}

where $d_{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$ are the configuration distances and $\hat{\delta}_{ij}$ are the disparities (transformed dissimilarities). In metric MDS, $\hat{\delta}_{ij} = f(\delta_{ij})$ for a known function $f$ (typically the identity).

Common stress measures include:

\text{STRESS-1} = \sqrt{\frac{\sum_{i<j}(\delta_{ij} - d_{ij})^2}{\sum_{i<j}\delta_{ij}^2}}

\text{STRESS-2} = \sqrt{\frac{\sum_{i<j}(\delta_{ij} - d_{ij})^2}{\sum_{i<j}(d_{ij} - \bar{d})^2}}

\text{RSS} = \sum_{i<j} (\delta_{ij} - d_{ij})^2 \quad \text{(Raw Stress)}

Strain (Smallest Space Analysis)

An alternative to stress is the strain function (Lingoes, 1971; Guttman, 1968):

DfStrain

\text{Strain} = \sum_{i,j} \left(b_{ij} - \mathbf{x}_i^{\top}\mathbf{x}_j\right)^2

where $b_{ij}$ are the elements of the centered matrix $\mathbf{B}$ . Minimizing strain is equivalent to classical MDS and can be solved via eigendecomposition.

Non-Metric MDS

Ordinal Stress

When only the rank order of dissimilarities is meaningful (ordinal data), non-metric MDS (Kruskal, 1964) finds a monotonic transformation:

DfNon-Metric MDS

Non-metric MDS minimizes:

\text{Stress}_{nm} = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}_{ij})^2}{\sum_{i<j} d_{ij}^2}}

subject to the monotonicity constraint:

\delta_{ij} < \delta_{kl} \implies \hat{d}_{ij} \leq \hat{d}_{kl}

where $\hat{d}_{ij}$ are the disparities obtained by isotonic regression of $d_{ij}$ on $\delta_{ij}$ .

The Primary Approach places ties in the dissimilarities at the same distance level. The Secondary Approach places ties at different distances.

Algorithm: SMACOF

The SMACOF (Scaling by Majorizing a Complicated Function) algorithm (de Leeuw, 1977) provides guaranteed convergence to a local minimum:

\mathbf{X}^{(t+1)} = \mathbf{V}^{\dagger}\left[\mathbf{X}^{(t)} - \frac{1}{\tau}\mathbf{G}(\mathbf{X}^{(t)})\mathbf{X}^{(t)}\right]

where $\mathbf{V}^{\dagger}$ is the Moore-Penrose pseudoinverse of the configuration matrix, $\tau$ is the step size, and $\mathbf{G}(\mathbf{X})$ is the Guttman transform.

SMACOF guarantees monotone decrease of stress at each iteration. The convergence rate is linear, and the algorithm typically converges in 50–200 iterations. Multiple random starts are recommended to avoid local minima.

Shepard Diagram

The Shepard diagram is the primary diagnostic tool for MDS. It plots the original dissimilarities $\delta_{ij}$ (x-axis) against the configuration distances $d_{ij}$ (y-axis).

DfShepard Diagram

A Shepard diagram displays the relationship between original dissimilarities and MDS distances:

Metric MDS: Points cluster around a linear (or known function) relationship
Non-Metric MDS: Points cluster around a monotonically increasing function
Stress: Proportional to the vertical scatter of points around the monotonic curve
Degenerate solutions: All distances equal, appearing as a horizontal line

PROXSCAL

PROXSCAL (Commandeur & Heiser, 1993) is a general-purpose MDS algorithm that accommodates:

Metric, non-metric, and polynomial transformations
Individual differences (INDSCAL model)
Weighted Euclidean models
Multiple dissimilarity matrices

\sigma(\mathbf{X}, \mathbf{D}) = \sum_{k=1}^{K} \sum_{i<j} w_{ijk} \left(\delta_{ijk} - d_{ijk}\right)^2

where $K$ is the number of sources (replications), $w_{ijk}$ are weights, and $d_{ijk} = \|\mathbf{x}_{ik} - \mathbf{x}_{jk}\|$ in the weighted Euclidean model.

Interpretation and Dimensional Interpretation

Procrustean Rotation

To facilitate interpretation, the MDS configuration can be rotated to match a target configuration:

\mathbf{X}_{\text{rotated}} = \mathbf{X}\mathbf{T}

where $\mathbf{T}$ is an orthogonal rotation matrix that minimizes $\|\mathbf{X}_{\text{target}} - \mathbf{X}\mathbf{T}\|_F$ .

Property Fitting

External variables can be regressed onto MDS dimensions to label the axes:

\delta_{ij} \approx \beta_0 + \sum_{d=1}^{m} \beta_d |x_{id} - x_{jd}|

where $\beta_d$ are regression coefficients indicating how strongly variable $d$ relates to MDS dimension $d$ .

Python Implementation

import numpy as np
from sklearn.manifold import MDS
from sklearn.metrics import pairwise_distances
import matplotlib.pyplot as plt

np.random.seed(42)

# Generate synthetic dissimilarity data
n = 30
X_true = np.random.randn(n, 3)
D_true = pairwise_distances(X_true, metric='euclidean')

# --- Classical MDS (sklearn) ---
mds_classical = MDS(n_components=2, dissimilarity='euclidean', random_state=42)
X_classical = mds_classical.fit_transform(D_true)
print("Classical MDS stress:", mds_classical.stress_)

# --- Metric MDS ---
mds_metric = MDS(
    n_components=2,
    metric=True,
    dissimilarity='euclidean',
    random_state=42,
    n_init=10,
    max_iter=300,
)
X_metric = mds_metric.fit_transform(D_true)
print("Metric MDS stress:", mds_metric.stress_)

# --- Non-metric MDS ---
mds_nonmetric = MDS(
    n_components=2,
    metric=False,
    dissimilarity='euclidean',
    random_state=42,
    n_init=10,
    max_iter=300,
)
X_nonmetric = mds_nonmetric.fit_transform(D_true)
print("Non-metric MDS stress:", mds_nonmetric.stress_)

# --- Shepard Diagram ---
def shepard_diagram(D_true, X_mds, title="Shepard Diagram"):
    d_mds = pairwise_distances(X_mds)
    triu_idx = np.triu_indices(n, k=1)

    fig, ax = plt.subplots(figsize=(6, 6))
    ax.scatter(D_true[triu_idx], d_mds[triu_idx], alpha=0.3, s=10)

    # Monotonic line (isotonic regression)
    from scipy.stats import rankdata
    from sklearn.isotonic import IsotonicRegression
    ir = IsotonicRegression()
    d_flat = D_true[triu_idx].reshape(-1)
    m_flat = d_mds[triu_idx].reshape(-1)
    ir.fit(d_flat, m_flat)
    x_mono = np.linspace(d_flat.min(), d_flat.max(), 200)
    ax.plot(x_mono, ir.predict(x_mono), 'r-', linewidth=2, label='Monotonic fit')
    ax.plot([0, d_flat.max()], [0, d_flat.max()], 'k--', alpha=0.5, label='Perfect')
    ax.set_xlabel("Original dissimilarities")
    ax.set_ylabel("MDS distances")
    ax.set_title(title)
    ax.legend()
    plt.tight_layout()
    return fig

# --- Classical MDS via eigendecomposition ---
def classical_mds_eigen(D, m=2):
    n = D.shape[0]
    D2 = D ** 2
    J = np.eye(n) - np.ones((n, n)) / n
    B = -0.5 * J @ D2 @ J

    eigenvalues, eigenvectors = np.linalg.eigh(B)
    idx = np.argsort(eigenvalues)[::-1][:m]
    Lambda_m = np.diag(np.sqrt(np.maximum(eigenvalues[idx], 0)))
    X = eigenvectors[:, idx] @ Lambda_m
    return X, eigenvalues

X_manual, evals = classical_mds_eigen(D_true, m=2)
proportion = np.sum(evals[evals > 0][:2]) / np.sum(evals[evals > 0])
print(f"Classical MDS variance explained: {proportion:.4f}")

# --- Stress computation ---
def compute_stress(D, X, metric=True):
    n = D.shape[0]
    d_mds = pairwise_distances(X)
    if metric:
        return np.sqrt(np.sum((D - d_mds)**2) / np.sum(D**2))
    else:
        from scipy.stats import rankdata
        D_rank = rankdata(D, method='average')
        d_rank = rankdata(d_mds, method='average')
        return np.sqrt(np.sum((D_rank - d_rank)**2) / np.sum(D_rank**2))

stress_manual = compute_stress(D_true, X_manual)
print(f"Manual classical MDS stress: {stress_manual:.4f}")

# --- Individual Differences (INDSCAL) ---
def indscal(K_matrices, m=2):
    n = K_matrices[0].shape[0]
    K = len(K_matrices)
    B_avg = np.zeros((n, n))

    weights = []
    for Dk in K_matrices:
        Dk2 = Dk ** 2
        J = np.eye(n) - np.ones((n, n)) / n
        B_k = -0.5 * J @ Dk2 @ J
        B_avg += B_k

    B_avg /= K
    eigenvalues, eigenvectors = np.linalg.eigh(B_avg)
    idx = np.argsort(eigenvalues)[::-1][:m]
    X_group = eigenvectors[:, idx] @ np.diag(np.sqrt(np.maximum(eigenvalues[idx], 0)))

    for Dk in K_matrices:
        Dk2 = Dk ** 2
        J = np.eye(n) - np.ones((n, n)) / n
        B_k = -0.5 * J @ Dk2 @ J
        w_k = np.diag(X_group.T @ B_k @ X_group) / np.diag(X_group.T @ X_group)
        weights.append(w_k)

    return X_group, np.array(weights)

# Simulate 3 judges
K_matrices = [D_true + 0.3 * np.random.randn(n, n) for _ in range(3)]
K_matrices = [(k + k.T) / 2 for k in K_matrices]  # symmetrize
X_indscal, w_indscal = indscal(K_matrices, m=2)
print("INDSCAL weights:\n", np.round(w_indscal, 3))

Quality Assessment

Assessing MDS Solution Quality:

Stress value: Stress-1 < 0.025 (excellent), < 0.05 (good), < 0.10 (fair), > 0.15 (poor). Use scree plot of stress vs. dimensionality.
Shepard diagram: Examine the scatter of points around the monotonic curve. Tight clustering indicates good fit.
Variance accounted for (VAF): In classical MDS, the proportion of positive eigenvalues captured by the $m$ -dimensional solution.
R-squared: $\text{RSQ} = 1 - \text{Stress}^2$ , the proportion of variance in disparities explained by distances.
Number of dimensions: Use scree plot (stress vs. $m$ ), elbow criterion, or interpretability criterion.
Local minima: Run MDS with multiple random starts (10–50) and verify convergence to the same stress level.
Stability: Perturb the dissimilarities slightly and check if the configuration remains similar (Procrustes correlation > 0.90).
Degenerate solutions: If all distances are equal or the configuration collapses to a single point, the MDS solution is degenerate — indicating severe misfit.

Applications

Psychology: Mapping perceived similarity between stimuli (e.g., colors, facial expressions, concepts). The spatial configuration reveals the psychological dimensions underlying perception.

Marketing: Product positioning — mapping consumer perceptions of brands in a space where distances reflect perceived dissimilarity. Competitive strategies are informed by the proximity structure.

Ecology: Ordination of species communities along environmental gradients. Non-metric MDS (NMDS) is the standard method for community ecology because it handles non-Euclidean ecological distances (Bray-Curtis, Jaccard).

Bioinformatics: Mapping protein structures, gene expression profiles, or drug response patterns where direct Euclidean distance is inappropriate.

Multidimensional Scaling (MDS)