The True Average of Multiplicative Growth
Descriptive Statistics
Geometric Mean: Where Compounding Reveals the Truth
A stock doubles then crashes 50%. The arithmetic mean says you gained 25%. Your portfolio says otherwise.
- Investment returns — CAGR is the geometric mean, not the arithmetic average
- Population growth — multiplicative rates compound, and so should their average
- AM-GM inequality — the geometric mean is always less than or equal to the arithmetic mean
When values multiply rather than add, only the geometric mean tells the real story.
What is the Geometric Mean?
Definition
The geometric mean is the nth root of the product of n positive values. It is the appropriate average for ratios, rates of change, and multiplicative processes.
Geometric Mean
Here,
- =Geometric mean
- =The i-th positive value
- =Number of values
Why Geometric Mean for Growth Rates?
Why It Matters
Suppose a stock grows: Year 1: +100%, Year 2: −50%.
- Arithmetic mean: (100% − 50%) / 2 = +25% per year -> sounds great
- Actual result: 200 -> $100 -> back where you started (0% per year!)
- Geometric mean: √(2.00 × 0.50) − 1 = √(1.0) − 1 = 0% -> correct!
import numpy as np
from scipy.stats import gmean
import pandas as pd
import matplotlib.pyplot as plt
# ==========================================
# Example 1: Investment Returns
# ==========================================
annual_returns = [0.25, -0.10, 0.40, -0.20, 0.15] # +25%, -10%, etc.
multipliers = [1 + r for r in annual_returns]
# Geometric mean of multipliers
geo_mean_multiplier = gmean(multipliers)
arith_mean_multiplier = np.mean(multipliers)
geo_cagr = geo_mean_multiplier - 1
arith_mean_return = np.mean(annual_returns)
print("Annual returns:", [f"{r:.0%}" for r in annual_returns])
print(f"Arithmetic mean return: {arith_mean_return:.2%} <- WRONG for compounding")
print(f"Geometric mean (CAGR): {geo_cagr:.2%} <- CORRECT for compounding")
# Verify: compound at geometric mean
initial = 1000
final_actual = initial * np.prod(multipliers)
final_geometric = initial * (1 + geo_cagr)**5
print(f"\nActual final value: ${final_actual:.2f}")
print(f"Projected via CAGR: ${final_geometric:.2f}")
# ==========================================
# Example 2: Population Growth
# ==========================================
years = [2018, 2019, 2020, 2021, 2022, 2023]
population = [50000, 53000, 56180, 59551, 63124, 66911]
growth_rates = [population[i]/population[i-1] for i in range(1, len(population))]
geo_mean_growth = gmean(growth_rates)
print(f"\nAnnual growth rates: {[f'{r:.4f}' for r in growth_rates]}")
print(f"Geometric mean growth rate: {geo_mean_growth:.4f}")
print(f"Expected CAGR: {geo_mean_growth - 1:.2%}")
# Verify
n_years = len(population) - 1
cagr_from_endpoints = (population[-1]/population[0])**(1/n_years)
print(f"CAGR from endpoints: {cagr_from_endpoints - 1:.2%}")
# ==========================================
# Example 3: Geometric vs Arithmetic Mean
# ==========================================
print("\n=== AM-GM Inequality ===")
print("Arithmetic mean is ALWAYS ≥ Geometric mean (for positive values)")
for trial in range(5):
x = np.random.uniform(1, 100, 10)
am = np.mean(x)
gm = gmean(x)
print(f" AM = {am:.3f}, GM = {gm:.3f}, AM ≥ GM: {am >= gm}")
Log-transform Shortcut
DfLog-Transform Shortcut
The geometric mean of X equals exp(arithmetic mean of log X):
# Same result, numerically more stable for many values
data = np.random.uniform(1, 100, 1000)
gm_direct = gmean(data)
gm_log = np.exp(np.log(data).mean())
print(f"Direct gmean: {gm_direct:.6f}")
print(f"Via log: {gm_log:.6f}")
When to Use Each Mean
| Situation | Use |
|---|---|
| Average price, temperature, test score | Arithmetic mean |
| Average growth rate, CAGR, inflation | Geometric mean |
| Average rate (speed, frequency) | Harmonic mean |
| Ratios, multiplicative processes | Geometric mean |
Key Takeaways
Use the geometric mean for growth rates and ratios — the arithmetic mean overstates returns in compounding scenarios.
CAGR (compound annual growth rate) is the geometric mean of annual multipliers — the standard in finance.
The AM-GM inequality guarantees the geometric mean never exceeds the arithmetic mean for positive values.
The geometric mean is undefined if any value is zero or negative — it only works with positive numbers.
Geometric Mean in Machine Learning
| ML Application | Why Geometric Mean | How |
|---|---|---|
| F1 Score | Balances precision and recall | F1 = √(precision × recall) |
| Growth rates | Multiplicative processes | Compound growth |
| Log-returns | Financial ML models | Geometric mean of returns |
| Multiplicative attention | Scaling by product of gates | LSTMs, GRUs |
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# F1 score IS the geometric mean of precision and recall (approximately)
X, y = make_classification(n_samples=500, n_features=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
geo_mean = np.sqrt(precision * recall)
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1 Score: {f1:.3f}")
print(f"√(P × R): {geo_mean:.3f}")
print(f"F1 ≈ geometric mean of precision and recall")
# Compound growth
daily_returns = np.array([1.02, 0.98, 1.05, 1.01, 0.97])
arith_mean = np.mean(daily_returns)
geo_mean_return = np.prod(daily_returns) ** (1/len(daily_returns))
print(f"\nArithmetic mean return: {arith_mean:.4f}")
print(f"Geometric mean return: {geo_mean_return:.4f}")
print(f"Total compound return: {np.prod(daily_returns) - 1:.4f}")
Compounding is multiplicative, not additive — respect the math, or your financial projections will deceive you.