Spearman Rank Correlation
Descriptive Statistics
Unlock Monotonic Relationships That Pearson Misses
When your data is ranked, ordinal, or simply not linear, Pearson's r gives misleading answers. Spearman's ρ works on ranks instead of raw values, making it the go-to measure for non-parametric association.
Key things Spearman's rank correlation helps you understand:
- Ordinal data — Measure association in Likert scales, rankings, and satisfaction surveys where raw values lack meaning.
- Non-linear trends — Detect monotonic relationships that increase or decrease together but not at a constant rate.
- Outlier resilience — Because it uses ranks, extreme values have far less influence on the result.
When Pearson says "no relationship," Spearman often reveals the hidden monotonic connection.
What is Spearman Rank Correlation?
Definition
Spearman's ρ (rho) measures the monotonic relationship between two variables using ranks instead of raw values. It is the non-parametric alternative to Pearson's r.
DfSpearman Rank Correlation
Spearman's rank correlation coefficient measures the strength and direction of the monotonic association between two ranked variables. It is computed on the ranks of the data rather than the raw values.
Spearman's ρ Formula
Here,
- =Difference between the ranks of corresponding values
- =Number of observations
- =Spearman's rank correlation coefficient
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
# Monotonic but non-linear relationship
x = np.random.uniform(0, 10, 50)
y = np.exp(x / 5) + np.random.normal(0, 2, 50)
r_pearson, _ = stats.pearsonr(x, y)
r_spearman, _ = stats.spearmanr(x, y)
print(f"Pearson r = {r_pearson:.4f}")
print(f"Spearman ρ = {r_spearman:.4f}")
When to Use Spearman vs Pearson
| Feature | Pearson r | Spearman ρ |
|---|---|---|
| Relationship type | Linear | Monotonic |
| Data type | Continuous | Ordinal or continuous |
| Outlier sensitivity | High | Low |
| Normality required | Yes | No |
| Rank transformation | No | Yes |
# Example with ordinal data (ranked preferences)
preference_a = [1, 2, 3, 4, 5]
preference_b = [2, 1, 4, 3, 5]
rho, p_value = stats.spearmanr(preference_a, preference_b)
print(f"Spearman ρ = {rho:.4f}, p = {p_value:.4f}")
Tie Handling
# When data has tied ranks, Spearman uses a corrected formula
x = [1, 2, 2, 3, 4, 5]
y = [2, 3, 3, 4, 5, 6]
rho, p_value = stats.spearmanr(x, y)
print(f"With ties: ρ = {rho:.4f}, p = {p_value:.4f}")
Ties Correction
When tied ranks exist, Spearman's ρ uses the Pearson correlation on the ranks. SciPy's spearmanr() automatically handles this correction.
Common Applications
- Ordinal data analysis — Likert scales, rankings, satisfaction surveys
- Non-linear monotonic relationships — exponential growth patterns
- Robust correlation — when outliers are present
- Medical research — pain scales, severity rankings
Spearman Rank Correlation in Machine Learning
| ML Application | Rank Correlation Usage | Why |
|---|---|---|
| Feature selection | Non-linear relationships | Pearson misses monotonic trends |
| Ordinal targets | Rank-based models | Ordinal regression |
| Robust correlation | Outlier-resistant | Better than Pearson with outliers |
| Ranking models | Learning to rank | Evaluate ranking quality |
import numpy as np
from scipy.stats import spearmanr, pearsonr
from sklearn.feature_selection import mutual_info_regression
np.random.seed(42)
n = 200
# Non-linear monotonic relationship
x = np.random.uniform(0, 10, n)
y = np.sin(x) + np.random.randn(n) * 0.3
pearson_r, _ = pearsonr(x, y)
spearman_r, _ = spearmanr(x, y)
print(f"Non-linear monotonic relationship:")
print(f"Pearson: r = {pearson_r:.3f} (underestimates)")
print(f"Spearman: ρ = {spearman_r:.3f} (captures monotonic trend)")
Key Takeaways
Spearman's ρ measures monotonic association — whether variables tend to move together in the same direction, regardless of linearity.
Computed on ranks — robust to outliers and non-normal distributions.
ρ = +1 means perfect monotonic increase; ρ = -1 means perfect monotonic decrease.
Use Spearman when data is ordinal, the relationship is non-linear but monotonic, or outliers are present. Use Pearson when the relationship is linear and assumptions are met.
"Ranks don't lie — they tell you what direction the data is moving, even when the relationship refuses to be linear."
Summary: Spearman Rank Correlation
- Spearman's ρ measures monotonic association — whether variables tend to move together in the same direction, regardless of linearity
- Computed on ranks — robust to outliers and non-normal distributions
- ρ = +1 means perfect monotonic increase; ρ = -1 means perfect monotonic decrease
- Use Spearman when: data is ordinal, relationship is non-linear but monotonic, or outliers are present
- Use Pearson when: relationship is linear and assumptions are met
- Tied ranks are handled automatically by scipy.stats.spearmanr()