🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Weighted Mean — Formula, Applications, and Python

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

When Not All Data Points Deserve Equal Say

Descriptive Statistics

The Weighted Mean: Giving Voice to What Matters

A four-credit course shouldn't count the same as a one-credit elective. The weighted mean fixes the blind spots of simple averages.

  • GPA calculation — credit hours determine how much each grade matters
  • Portfolio returns — dollar weights reveal true investment performance
  • Survey correction — adjusting for who actually responded versus who should have

When observations differ in importance, the arithmetic mean lies — the weighted mean tells the truth.


What is the Weighted Mean?

Definition

The weighted mean assigns different weights to different observations, allowing some values to have more influence on the average than others.


Weighted Mean

xˉw=i=1nwixii=1nwi\bar{x}_w = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}

Here,

  • wiw_i=Weight for the i-th observation
  • xix_i=The i-th observation
  • nn=Number of observations
  • xˉw\bar{x}_w=Weighted mean

When to Use Weighted Mean

SituationWhy Weights Matter
GPA (grade point average)Courses have different credit hours
Portfolio returnsAssets have different dollar weights
Survey analysisRespondents represent different group sizes
Grouped frequency dataEach midpoint represents many observations
Price indices (CPI, etc.)Goods have different consumption weights

Python Implementation

import numpy as np
import pandas as pd

# ========================================
# Example 1: GPA Calculation
# ========================================
courses = pd.DataFrame({
    'Course': ['Statistics', 'Linear Algebra', 'Machine Learning', 'Databases', 'Ethics'],
    'Grade_Points': [4.0, 3.7, 3.3, 4.0, 3.0],
    'Credits': [4, 3, 4, 3, 1]
})

weighted_gpa = np.average(courses['Grade_Points'], weights=courses['Credits'])
simple_gpa = courses['Grade_Points'].mean()

print("Courses:")
print(courses.to_string(index=False))
print(f"\nWeighted GPA: {weighted_gpa:.4f}")
print(f"Unweighted GPA: {simple_gpa:.4f}")
print(f"Difference: {weighted_gpa - simple_gpa:+.4f} (credits matter!)")

# ========================================
# Example 2: Portfolio Return
# ========================================
portfolio = pd.DataFrame({
    'Asset': ['Stock A', 'Stock B', 'Bonds', 'Cash'],
    'Weight_pct': [40, 35, 20, 5],
    'Return_pct': [12.5, -3.2, 4.1, 0.5]
})

portfolio_return = np.average(portfolio['Return_pct'], 
                               weights=portfolio['Weight_pct'])
simple_avg_return = portfolio['Return_pct'].mean()

print("\nPortfolio:")
print(portfolio.to_string(index=False))
print(f"\nWeighted portfolio return: {portfolio_return:.2f}%")
print(f"Simple average return:     {simple_avg_return:.2f}%")

# ========================================
# Example 3: Survey Weighting (post-stratification)
# ========================================
survey = pd.DataFrame({
    'Group': ['18-34', '35-54', '55+'],
    'Survey_pct': [15, 50, 35],    # % in survey sample
    'Pop_pct': [30, 40, 30],       # % in true population
    'Support_pct': [75, 55, 40]    # % supporting the policy
})

# Unweighted (biased — overrepresents 35-54 group)
unweighted = np.average(survey['Support_pct'], weights=survey['Survey_pct'])
# Weighted to population (correct)
weighted = np.average(survey['Support_pct'], weights=survey['Pop_pct'])

print("\nSurvey with nonrepresentative sample:")
print(survey.to_string(index=False))
print(f"\nUnweighted mean: {unweighted:.1f}% support (biased!)")
print(f"Population-weighted mean: {weighted:.1f}% support (correct)")

Properties of the Weighted Mean

# Property 1: Reduces to arithmetic mean when all weights equal
equal_weights = [1, 1, 1, 1, 1]
data = [10, 20, 30, 40, 50]
print(f"Equal weights -> weighted mean = {np.average(data, weights=equal_weights):.1f}")
print(f"Simple mean = {np.mean(data):.1f}")

# Property 2: Extreme weights force convergence toward that value
extreme_weights = [1, 1, 1, 1, 100]
print(f"Extreme weight on last -> weighted mean = {np.average(data, weights=extreme_weights):.1f}")
print(f"Last value: {data[-1]}")  # Should be close to 50

# Property 3: Normalized weights must sum to 1 for percentages to work
weights = [4, 3, 4, 3, 1]
norm_weights = [w/sum(weights) for w in weights]
print(f"Normalized weights: {[f'{w:.3f}' for w in norm_weights]}")
print(f"Sum: {sum(norm_weights):.4f}")

Key Takeaways

The weighted mean is the correct tool when observations differ in importance — never treat unequal data points equally.

GPA, portfolio returns, and price indices are all weighted means in disguise — recognizing this prevents costly errors.

Survey post-stratification weighting corrects for nonprobability samples and prevents biased conclusions.

Equal weights reduce to the arithmetic mean — the simple average is just a special case of the weighted mean.

Weighted Mean in Machine Learning

ML ApplicationWeighting SchemeWhy
Ensemble modelsAccuracy-weighted averageBetter models get more vote
Inverse-variance weightingw = 1/σ²Minimum variance estimator
Attention mechanismsLearned weightsTransformer attention = weighted mean of tokens
Loss aggregationSample weightsRare class gets higher weight
Exponential moving averageExponential decay weightsRecent data weighted more
import numpy as np
from sklearn.metrics import accuracy_score

# Ensemble: weighted average of 3 models
model_preds = np.array([
    [0.9, 0.1, 0.2, 0.8],  # Model A
    [0.1, 0.8, 0.3, 0.7],  # Model B
    [0.2, 0.7, 0.8, 0.3],  # Model C
])
model_weights = np.array([0.5, 0.3, 0.2])  # A is best

weighted_avg = np.average(model_preds, axis=0, weights=model_weights)
print(f"Weighted ensemble prediction: {weighted_avg}")
print(f"Predicted class: {np.argmax(weighted_avg)}")

# Attention mechanism (simplified)
# Query, Key, Value — attention scores = weights over values
tokens = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])  # 3 tokens
attention_scores = np.array([0.7, 0.2, 0.1])  # softmax output
output = np.average(tokens, axis=0, weights=attention_scores)
print(f"\nAttention output: {output}")
print("This is exactly a weighted mean of token embeddings!")

Weighting is not a statistical trick — it's an acknowledgment that in the real world, not all observations are created equal.

Premium Content

Weighted Mean — Formula, Applications, and Python

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement