When Not All Data Points Deserve Equal Say
Descriptive Statistics
The Weighted Mean: Giving Voice to What Matters
A four-credit course shouldn't count the same as a one-credit elective. The weighted mean fixes the blind spots of simple averages.
- GPA calculation — credit hours determine how much each grade matters
- Portfolio returns — dollar weights reveal true investment performance
- Survey correction — adjusting for who actually responded versus who should have
When observations differ in importance, the arithmetic mean lies — the weighted mean tells the truth.
What is the Weighted Mean?
Definition
The weighted mean assigns different weights to different observations, allowing some values to have more influence on the average than others.
Weighted Mean
Here,
- =Weight for the i-th observation
- =The i-th observation
- =Number of observations
- =Weighted mean
When to Use Weighted Mean
| Situation | Why Weights Matter |
|---|---|
| GPA (grade point average) | Courses have different credit hours |
| Portfolio returns | Assets have different dollar weights |
| Survey analysis | Respondents represent different group sizes |
| Grouped frequency data | Each midpoint represents many observations |
| Price indices (CPI, etc.) | Goods have different consumption weights |
Python Implementation
import numpy as np
import pandas as pd
# ========================================
# Example 1: GPA Calculation
# ========================================
courses = pd.DataFrame({
'Course': ['Statistics', 'Linear Algebra', 'Machine Learning', 'Databases', 'Ethics'],
'Grade_Points': [4.0, 3.7, 3.3, 4.0, 3.0],
'Credits': [4, 3, 4, 3, 1]
})
weighted_gpa = np.average(courses['Grade_Points'], weights=courses['Credits'])
simple_gpa = courses['Grade_Points'].mean()
print("Courses:")
print(courses.to_string(index=False))
print(f"\nWeighted GPA: {weighted_gpa:.4f}")
print(f"Unweighted GPA: {simple_gpa:.4f}")
print(f"Difference: {weighted_gpa - simple_gpa:+.4f} (credits matter!)")
# ========================================
# Example 2: Portfolio Return
# ========================================
portfolio = pd.DataFrame({
'Asset': ['Stock A', 'Stock B', 'Bonds', 'Cash'],
'Weight_pct': [40, 35, 20, 5],
'Return_pct': [12.5, -3.2, 4.1, 0.5]
})
portfolio_return = np.average(portfolio['Return_pct'],
weights=portfolio['Weight_pct'])
simple_avg_return = portfolio['Return_pct'].mean()
print("\nPortfolio:")
print(portfolio.to_string(index=False))
print(f"\nWeighted portfolio return: {portfolio_return:.2f}%")
print(f"Simple average return: {simple_avg_return:.2f}%")
# ========================================
# Example 3: Survey Weighting (post-stratification)
# ========================================
survey = pd.DataFrame({
'Group': ['18-34', '35-54', '55+'],
'Survey_pct': [15, 50, 35], # % in survey sample
'Pop_pct': [30, 40, 30], # % in true population
'Support_pct': [75, 55, 40] # % supporting the policy
})
# Unweighted (biased — overrepresents 35-54 group)
unweighted = np.average(survey['Support_pct'], weights=survey['Survey_pct'])
# Weighted to population (correct)
weighted = np.average(survey['Support_pct'], weights=survey['Pop_pct'])
print("\nSurvey with nonrepresentative sample:")
print(survey.to_string(index=False))
print(f"\nUnweighted mean: {unweighted:.1f}% support (biased!)")
print(f"Population-weighted mean: {weighted:.1f}% support (correct)")
Properties of the Weighted Mean
# Property 1: Reduces to arithmetic mean when all weights equal
equal_weights = [1, 1, 1, 1, 1]
data = [10, 20, 30, 40, 50]
print(f"Equal weights -> weighted mean = {np.average(data, weights=equal_weights):.1f}")
print(f"Simple mean = {np.mean(data):.1f}")
# Property 2: Extreme weights force convergence toward that value
extreme_weights = [1, 1, 1, 1, 100]
print(f"Extreme weight on last -> weighted mean = {np.average(data, weights=extreme_weights):.1f}")
print(f"Last value: {data[-1]}") # Should be close to 50
# Property 3: Normalized weights must sum to 1 for percentages to work
weights = [4, 3, 4, 3, 1]
norm_weights = [w/sum(weights) for w in weights]
print(f"Normalized weights: {[f'{w:.3f}' for w in norm_weights]}")
print(f"Sum: {sum(norm_weights):.4f}")
Key Takeaways
The weighted mean is the correct tool when observations differ in importance — never treat unequal data points equally.
GPA, portfolio returns, and price indices are all weighted means in disguise — recognizing this prevents costly errors.
Survey post-stratification weighting corrects for nonprobability samples and prevents biased conclusions.
Equal weights reduce to the arithmetic mean — the simple average is just a special case of the weighted mean.
Weighted Mean in Machine Learning
| ML Application | Weighting Scheme | Why |
|---|---|---|
| Ensemble models | Accuracy-weighted average | Better models get more vote |
| Inverse-variance weighting | w = 1/σ² | Minimum variance estimator |
| Attention mechanisms | Learned weights | Transformer attention = weighted mean of tokens |
| Loss aggregation | Sample weights | Rare class gets higher weight |
| Exponential moving average | Exponential decay weights | Recent data weighted more |
import numpy as np
from sklearn.metrics import accuracy_score
# Ensemble: weighted average of 3 models
model_preds = np.array([
[0.9, 0.1, 0.2, 0.8], # Model A
[0.1, 0.8, 0.3, 0.7], # Model B
[0.2, 0.7, 0.8, 0.3], # Model C
])
model_weights = np.array([0.5, 0.3, 0.2]) # A is best
weighted_avg = np.average(model_preds, axis=0, weights=model_weights)
print(f"Weighted ensemble prediction: {weighted_avg}")
print(f"Predicted class: {np.argmax(weighted_avg)}")
# Attention mechanism (simplified)
# Query, Key, Value — attention scores = weights over values
tokens = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) # 3 tokens
attention_scores = np.array([0.7, 0.2, 0.1]) # softmax output
output = np.average(tokens, axis=0, weights=attention_scores)
print(f"\nAttention output: {output}")
print("This is exactly a weighted mean of token embeddings!")
Weighting is not a statistical trick — it's an acknowledgment that in the real world, not all observations are created equal.