The Arithmetic Mean
Descriptive Statistics
The Most Used — and Most Misused — Statistical Measure
The arithmetic mean is the most widely used statistical measure in all of statistics. Understanding it deeply saves you from common analytical errors.
- Algebraic properties — Sum of deviations equals zero; minimizes sum of squared deviations
- Population vs sample — Why dividing by n-1 gives an unbiased estimator
- Sensitivity to outliers — One extreme value can pull the mean far from the center
- Trimmed and winsorized alternatives — Robust versions for contaminated data
The mean is powerful, but it is not always right. Know when to use it and when to look elsewhere.
What is the Arithmetic Mean?
Definition
The arithmetic mean of a set of values is the sum of the values divided by the number of values. It is the value that minimizes the sum of squared deviations from itself.
Definition and Formula
DfArithmetic Mean
The arithmetic mean of a set of values is the sum of the values divided by the number of values. It is the value that minimizes the sum of squared deviations from itself.
For a sample of observations:
Sample Mean
Here,
- =Sample mean
- =Number of observations in the sample
- =The i-th observation
For a population of observations:
Population Mean
Here,
- =Population mean
- =Population size
For a continuous random variable with probability density function :
Expected Value (Population Mean)
Here,
- =Expected value of X
- =Probability density function
Algebraic Properties of the Mean
ThFundamental Properties
- Sum of deviations = 0:
- The mean is the "balance point" of the distribution.
- Linear transformation:
- The mean commutes with affine transformations.
- Minimizes sum of squared deviations:
- The mean is the least squares estimator of location.
- Additivity for independent variables:
- This holds even without independence.
- Homogeneity of degree 1:
Proof that the Mean Minimizes Sum of Squared Deviations
ThMean as Least Squares Estimator
Define . Taking the derivative and setting it to zero:
The second derivative confirms this is a minimum.
Weighted Mean
When observations have different importance, frequency, or precision:
Weighted Mean
Here,
- =Weight for the i-th observation
- =The i-th observation
- =Weighted mean
The ordinary mean is the special case for all . In the inverse-variance weighting scheme, , which gives the minimum-variance unbiased estimator.
Mean for Grouped Data
When raw data is unavailable (only a frequency table):
Mean for Grouped Data
Here,
- =Frequency of the i-th class
- =Midpoint of the i-th class interval
- =Number of classes
This is an approximation — the exact mean cannot be recovered from grouped data.
Trimmed Mean: A Robust Alternative
DfTrimmed Mean
The -trimmed mean removes the smallest fraction and largest fraction of observations before computing the mean:
where are the order statistics. Common choices: or .
The trimmed mean trades a small amount of efficiency (under normality) for greatly improved robustness against outliers.
Influence Function and Breakdown Point
ThInfluence Function of the Mean
The influence function of the arithmetic mean is:
This is unbounded — a single extreme observation can shift the mean arbitrarily. This is why the mean is not robust.
The breakdown point of the mean is — a single observation at infinity can make the mean infinite.
Limitations of the Mean
| Problem | Example | Solution |
|---|---|---|
| Sensitive to outliers | CEO salary distorts avg company salary | Use median |
| Meaningless for nominal data | Mean blood type is nonsense | Use mode |
| Inappropriate for skewed data | Mean income misleads | Use median |
| May not be a possible value | Mean family size = 2.3 children | Use appropriate measure |
| Hides multimodality | Mean of bimodal = between the modes | Visualize first |
| Unbounded influence | One extreme value shifts mean arbitrarily | Use trimmed mean or winsorized mean |
The Mean in Machine Learning
The arithmetic mean is the mathematical foundation of ML:
| ML Concept | How Mean is Used | Formula |
|---|---|---|
| Mean Squared Error | Minimizing → predicts the mean | MSE = (1/n)Σ(yᵢ - ŷᵢ)² |
| Batch Normalization | Stabilizes training by centering activations | x̂ = (x - μ_batch) / σ_batch |
| Mean Pooling (NLP) | Averages token embeddings | h = (1/T)Σhₜ |
| StandardScaler | Centers features to mean=0 | x_scaled = (x - μ) / σ |
| Weighted Loss | Inverse-variance weighting | wᵢ = 1/σᵢ² |
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# The mean is the minimizer of MSE
np.random.seed(42)
y_true = np.array([10, 20, 30, 40, 50, 100]) # skewed by outlier
# What value minimizes sum of squared errors?
candidates = np.linspace(0, 100, 200)
sse = [np.sum((y_true - c)**2) for c in candidates]
best = candidates[np.argmin(sse)]
print(f"Value minimizing SSE: {best:.1f}")
print(f"Mean of data: {np.mean(y_true):.1f}")
print(f"They are the same! The mean is the least squares estimator.\n")
# StandardScaler uses mean
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(f"Original mean: {X.mean(axis=0).round(3)}")
print(f"Scaled mean: {X_scaled.mean(axis=0).round(10)}")
print(f"Original std: {X.std(axis=0).round(3)}")
print(f"Scaled std: {X_scaled.std(axis=0).round(3)}")
Key Takeaways
Summary: Arithmetic Mean
- The mean is the balance point — sum of deviations always equals 0
- The mean is the least squares estimator of location — minimizes
- Linear transformations flow directly through the mean:
- Weighted mean accounts for unequal importance — gives minimum variance
- Trimmed mean provides robustness without abandoning the mean entirely
- The mean has unbounded influence — one extreme observation can shift it arbitrarily
- The mean is not always the right measure — always check your data's shape before choosing