The Arithmetic Mean

Descriptive Statistics

The Most Used — and Most Misused — Statistical Measure

The arithmetic mean is the most widely used statistical measure in all of statistics. Understanding it deeply saves you from common analytical errors.

Algebraic properties — Sum of deviations equals zero; minimizes sum of squared deviations
Population vs sample — Why dividing by n-1 gives an unbiased estimator
Sensitivity to outliers — One extreme value can pull the mean far from the center
Trimmed and winsorized alternatives — Robust versions for contaminated data

The mean is powerful, but it is not always right. Know when to use it and when to look elsewhere.

What is the Arithmetic Mean?

Definition

The arithmetic mean of a set of values is the sum of the values divided by the number of values. It is the value that minimizes the sum of squared deviations from itself.

Definition and Formula

DfArithmetic Mean

The arithmetic mean of a set of values is the sum of the values divided by the number of values. It is the value that minimizes the sum of squared deviations from itself.

For a sample of $n$ observations:

Sample Mean

\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n}\sum_{i=1}^n x_i

Here,

$\bar{x}$ =Sample mean
$n$ =Number of observations in the sample
$x_i$ =The i-th observation

For a population of $N$ observations:

Population Mean

\mu = \frac{1}{N}\sum_{i=1}^N x_i

Here,

$\mu$ =Population mean
$N$ =Population size

For a continuous random variable $X$ with probability density function $f(x)$ :

Expected Value (Population Mean)

\mu = E[X] = \int_{-\infty}^{\infty} x \, f(x) \, dx

Here,

$E[X]$ =Expected value of X
$f(x)$ =Probability density function

Algebraic Properties of the Mean

ThFundamental Properties

Sum of deviations = 0: $\sum_{i=1}^n (x_i - \bar{x}) = 0$
- The mean is the "balance point" of the distribution.
Linear transformation: $\overline{(aX + b)} = a\bar{X} + b$
- The mean commutes with affine transformations.
Minimizes sum of squared deviations: $\bar{x} = \arg\min_c \sum(x_i - c)^2$
- The mean is the least squares estimator of location.
Additivity for independent variables: $E[X + Y] = E[X] + E[Y]$
- This holds even without independence.
Homogeneity of degree 1: $E[aX] = aE[X]$

Proof that the Mean Minimizes Sum of Squared Deviations

ThMean as Least Squares Estimator

Define $f(c) = \sum_{i=1}^n (x_i - c)^2$ . Taking the derivative and setting it to zero:

f'(c) = -2\sum_{i=1}^n (x_i - c) = 0 \implies \sum_{i=1}^n x_i = nc \implies c = \bar{x}

The second derivative $f''(c) = 2n > 0$ confirms this is a minimum. $\square$

Weighted Mean

When observations have different importance, frequency, or precision:

Weighted Mean

\bar{x}_w = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}

Here,

$w_i$ =Weight for the i-th observation
$x_i$ =The i-th observation
$\bar{x}_w$ =Weighted mean

The ordinary mean is the special case $w_i = 1/n$ for all $i$ . In the inverse-variance weighting scheme, $w_i = 1/\sigma_i^2$ , which gives the minimum-variance unbiased estimator.

Mean for Grouped Data

When raw data is unavailable (only a frequency table):

Mean for Grouped Data

\bar{x} = \frac{\sum_{i=1}^k f_i m_i}{\sum_{i=1}^k f_i}

Here,

$f_i$ =Frequency of the i-th class
$m_i$ =Midpoint of the i-th class interval
$k$ =Number of classes

This is an approximation — the exact mean cannot be recovered from grouped data.

Trimmed Mean: A Robust Alternative

DfTrimmed Mean

The $\alpha$ -trimmed mean removes the smallest $\alpha$ fraction and largest $\alpha$ fraction of observations before computing the mean:

\bar{x}_{\text{trim}} = \frac{1}{n - 2\lfloor \alpha n \rfloor} \sum_{i=\lfloor \alpha n \rfloor + 1}^{n - \lfloor \alpha n \rfloor} x_{(i)}

where $x_{(i)}$ are the order statistics. Common choices: $\alpha = 0.05$ or $\alpha = 0.10$ .

The trimmed mean trades a small amount of efficiency (under normality) for greatly improved robustness against outliers.

Influence Function and Breakdown Point

ThInfluence Function of the Mean

The influence function of the arithmetic mean is:

\text{IF}(x; \bar{x}, F) = x - \mu

This is unbounded — a single extreme observation can shift the mean arbitrarily. This is why the mean is not robust.

The breakdown point of the mean is $1/n$ — a single observation at infinity can make the mean infinite.

Limitations of the Mean

Problem	Example	Solution
Sensitive to outliers	CEO salary distorts avg company salary	Use median
Meaningless for nominal data	Mean blood type is nonsense	Use mode
Inappropriate for skewed data	Mean income misleads	Use median
May not be a possible value	Mean family size = 2.3 children	Use appropriate measure
Hides multimodality	Mean of bimodal = between the modes	Visualize first
Unbounded influence	One extreme value shifts mean arbitrarily	Use trimmed mean or winsorized mean

The Mean in Machine Learning

The arithmetic mean is the mathematical foundation of ML:

ML Concept	How Mean is Used	Formula
Mean Squared Error	Minimizing → predicts the mean	MSE = (1/n)Σ(yᵢ - ŷᵢ)²
Batch Normalization	Stabilizes training by centering activations	x̂ = (x - μ_batch) / σ_batch
Mean Pooling (NLP)	Averages token embeddings	h = (1/T)Σhₜ
StandardScaler	Centers features to mean=0	x_scaled = (x - μ) / σ
Weighted Loss	Inverse-variance weighting	wᵢ = 1/σᵢ²

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# The mean is the minimizer of MSE
np.random.seed(42)
y_true = np.array([10, 20, 30, 40, 50, 100])  # skewed by outlier

# What value minimizes sum of squared errors?
candidates = np.linspace(0, 100, 200)
sse = [np.sum((y_true - c)**2) for c in candidates]
best = candidates[np.argmin(sse)]
print(f"Value minimizing SSE: {best:.1f}")
print(f"Mean of data: {np.mean(y_true):.1f}")
print(f"They are the same! The mean is the least squares estimator.\n")

# StandardScaler uses mean
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(f"Original mean: {X.mean(axis=0).round(3)}")
print(f"Scaled mean:   {X_scaled.mean(axis=0).round(10)}")
print(f"Original std:  {X.std(axis=0).round(3)}")
print(f"Scaled std:    {X_scaled.std(axis=0).round(3)}")

Key Takeaways

Summary: Arithmetic Mean

The mean is the balance point — sum of deviations always equals 0
The mean is the least squares estimator of location — minimizes $\sum(x_i - c)^2$
Linear transformations flow directly through the mean: $\overline{aX+b} = a\bar{X} + b$
Weighted mean accounts for unequal importance — $w_i = 1/\sigma_i^2$ gives minimum variance
Trimmed mean provides robustness without abandoning the mean entirely
The mean has unbounded influence — one extreme observation can shift it arbitrarily
The mean is not always the right measure — always check your data's shape before choosing

Arithmetic Mean — Formula, Properties, Computation, Limitations

The Arithmetic Mean

The Most Used — and Most Misused — Statistical Measure

What is the Arithmetic Mean?

Definition

Definition and Formula

DfArithmetic Mean

Sample Mean

Population Mean

Expected Value (Population Mean)

Algebraic Properties of the Mean

ThFundamental Properties

Proof that the Mean Minimizes Sum of Squared Deviations

ThMean as Least Squares Estimator

Weighted Mean

Weighted Mean

Mean for Grouped Data

Mean for Grouped Data

Trimmed Mean: A Robust Alternative

DfTrimmed Mean

Influence Function and Breakdown Point

ThInfluence Function of the Mean

Limitations of the Mean

The Mean in Machine Learning

Key Takeaways

Summary: Arithmetic Mean

Premium Content

Need Expert Statistics Help?