Harmonic Mean
Descriptive Statistics
When Average Rate Requires a Different Kind of Mean
The harmonic mean is the reciprocal of the arithmetic mean of reciprocals. It is the appropriate average when combining rates, speeds, or ratios where the denominator is the common reference.
- Average speed — Driving equal distances at different speeds? The harmonic mean gives the true average
- Portfolio analysis — Equal-dollar averaging produces harmonic, not arithmetic, returns
- Rate problems — When the denominator is fixed, the harmonic mean is the correct center
- Relationship to other means — Always less than or equal to the geometric and arithmetic means
The harmonic mean is the mean you never learned in school — and the one you need for rates.
What is the Harmonic Mean?
Definition
The harmonic mean is the reciprocal of the arithmetic mean of reciprocals. It is the appropriate average when combining rates, speeds, or ratios where the denominator is the common reference.
Harmonic Mean
Here,
- =Harmonic mean
- =The i-th positive value
- =Number of values
The Classic Speed Problem
Why It Matters
You drive 100 km at 60 km/h, then 100 km at 40 km/h. What is your average speed?
- Naive arithmetic mean: (60 + 40)/2 = 50 km/h -> WRONG
- Correct (harmonic mean): 2/(1/60 + 1/40) = 48 km/h
import numpy as np
from scipy.stats import hmean
import pandas as pd
# ==========================================
# Example 1: Average speed
# ==========================================
speeds = [60, 40] # km/h for equal distances
# Arithmetic mean (wrong!)
arith_speed = np.mean(speeds)
# Harmonic mean (correct for equal distances)
harm_speed = hmean(speeds)
# Verify directly
time_leg1 = 100 / 60 # hours
time_leg2 = 100 / 40 # hours
total_distance = 200 # km
total_time = time_leg1 + time_leg2
actual_avg_speed = total_distance / total_time
print(f"Speed leg 1: {speeds[0]} km/h, Speed leg 2: {speeds[1]} km/h")
print(f"Arithmetic mean: {arith_speed:.4f} km/h <- WRONG")
print(f"Harmonic mean: {harm_speed:.4f} km/h <- CORRECT")
print(f"Calculated directly: {actual_avg_speed:.4f} km/h ✓")
# ==========================================
# Example 2: P/E Ratios in Finance
# ==========================================
# For equal-dollar investments, harmonic mean of P/E ratios
pe_ratios = [20, 25, 15, 30, 10]
portfolio_pe_harm = hmean(pe_ratios)
portfolio_pe_arith = np.mean(pe_ratios)
print(f"\nP/E Ratios: {pe_ratios}")
print(f"Arithmetic mean P/E: {portfolio_pe_arith:.2f}")
print(f"Harmonic mean P/E: {portfolio_pe_harm:.2f}")
print("(Harmonic mean is correct for equal-dollar portfolio weighting)")
# ==========================================
# Example 3: F1-Score in Machine Learning
# ==========================================
# F1 = harmonic mean of Precision and Recall
precision = 0.80
recall = 0.60
f1_hmean = hmean([precision, recall])
f1_formula = 2 * precision * recall / (precision + recall)
arith_mean = np.mean([precision, recall])
print(f"\nPrecision: {precision:.2f}, Recall: {recall:.2f}")
print(f"Arithmetic mean: {arith_mean:.4f}")
print(f"F1 Score (harmonic mean): {f1_hmean:.4f}")
print(f"F1 via standard formula: {f1_formula:.4f}")
print("F1 score punishes large imbalances between P and R!")
Relationship: AM ≥ GM ≥ HM
AM-GM-HM Inequality
Here,
- =Arithmetic mean
- =Geometric mean
- =Harmonic mean
with equality only when all values are identical.
np.random.seed(42)
for trial in range(5):
x = np.random.uniform(1, 50, 8)
am = np.mean(x)
gm = np.exp(np.log(x).mean())
hm = hmean(x)
print(f"AM={am:.3f} ≥ GM={gm:.3f} ≥ HM={hm:.3f}: {am >= gm >= hm}")
When Each Mean Is Appropriate
| Mean | Formula | Use When |
|---|---|---|
| Arithmetic | Σxᵢ/n | Additive quantities (prices, temperatures) |
| Geometric | (∏xᵢ)^(1/n) | Multiplicative quantities (growth rates, ratios) |
| Harmonic | n/Σ(1/xᵢ) | Rates when denominator is fixed (speed, price per unit) |
Harmonic Mean in Machine Learning
| ML Application | Why Harmonic Mean | How |
|---|---|---|
| F1 Score | Balances precision and recall | F1 = 2 × (P × R) / (P + R) |
| Token throughput | Rate with fixed denominator | Tokens per second across models |
| Batch processing | Rate per batch | Images per second (fixed batch size) |
| Cross-entropy | Negative log — harmonic structure | -log(p) has harmonic-like properties |
import numpy as np
from scipy.stats import hmean
# F1 score IS the harmonic mean of precision and recall
precision, recall = 0.8, 0.6
f1 = 2 * (precision * recall) / (precision + recall)
f1_hmean = hmean([precision, recall])
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 (manual): {f1:.4f}")
print(f"F1 (hmean): {f1_hmean:.4f}")
print(f"F1 is the harmonic mean of precision and recall!\n")
# Why harmonic mean? Because P and R have same "denominator" (total samples)
# Arithmetic mean would overestimate: (0.8+0.6)/2 = 0.7
# Harmonic mean is conservative: 2(0.8)(0.6)/(0.8+0.6) = 0.6857
print(f"Arithmetic mean of P and R: {(precision+recall)/2:.4f} (overestimates)")
print(f"Harmonic mean of P and R: {f1:.4f} (conservative, correct)")
# Token throughput example
model_a_tokens = np.array([120, 150, 130]) # tokens/sec for 3 batches
model_b_tokens = np.array([200, 50, 180])
print(f"\nModel A: {model_a_tokens}")
print(f" Arithmetic mean: {np.mean(model_a_tokens):.1f} tokens/sec")
print(f" Harmonic mean: {hmean(model_a_tokens):.1f} tokens/sec (better for rates)")
Key Takeaways
Summary: Harmonic Mean
- Harmonic mean is for rates where the denominator quantity is equal (distance, money)
- F1-score is the harmonic mean of precision and recall in machine learning
- AM ≥ GM ≥ HM always — harmonic mean is smallest, arithmetic is largest
- Never use arithmetic mean for speeds over equal distances — it overestimates
- For unequal distances/time, use a weighted harmonic mean
- Undefined if any value is zero (you can't take 1/0)