🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Skewness — Measuring Asymmetry of Distributions

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Skewness

Descriptive Statistics

When Distributions Lean to One Side

Skewness reveals the hidden asymmetry in your data — and tells you whether the mean is trustworthy or misleading.

Understanding skewness helps you:

  • Interpret the mean — know when mean > median signals a long right tail
  • Choose the right model — decide between symmetric and skewed distributions
  • Fix data — apply log or square-root transforms to restore symmetry
  • Spot real-world patterns — income, house prices, and reaction times are almost always skewed

A symmetric distribution hides nothing. A skewed one whispers where the outliers hide.


What is Skewness?

Definition

Skewness quantifies asymmetry of a distribution. Positive skew pulls the right tail out; negative skew pulls the left tail out. Zero means symmetric.

Skewness (Fisher's)

skewness=1n(xixˉ)3s3\text{skewness} = \frac{\frac{1}{n}\sum(x_i - \bar{x})^3}{s^3}

Here,

  • xix_i=The i-th observation
  • xˉ\bar{x}=Sample mean
  • ss=Sample standard deviation
  • nn=Number of observations

Positive -> right tail. Negative -> left tail. Zero -> symmetric.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

np.random.seed(42)
right_skew = np.random.lognormal(0, 0.8, 2000)   # income-like
symmetric  = np.random.normal(0, 1, 2000)
left_skew  = -np.random.lognormal(0, 0.8, 2000)

for name, data in [("Right-Skewed", right_skew),
                    ("Symmetric", symmetric),
                    ("Left-Skewed", left_skew)]:
    sk = stats.skew(data)
    print(f"{name:<15}: skew={sk:+.4f}, mean={np.mean(data):.3f}, median={np.median(data):.3f}")

Mean vs Median Under Skewness

Architecture Diagram
Right-Skewed:   Mode < Median < Mean
Symmetric:      Mode ≈ Median ≈ Mean
Left-Skewed:    Mean < Median < Mode
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
datasets = [("Right-Skewed", right_skew, '#f8d7da'),
            ("Symmetric",    symmetric,   '#d4edda'),
            ("Left-Skewed",  left_skew,   '#d1ecf1')]

for ax, (name, data, color) in zip(axes, datasets):
    ax.hist(data, bins=50, density=True, color=color, edgecolor='gray', alpha=0.7)
    ax.axvline(np.mean(data), color='red', lw=2, ls='--', label=f'Mean={np.mean(data):.2f}')
    ax.axvline(np.median(data), color='blue', lw=2, ls='-', label=f'Median={np.median(data):.2f}')
    ax.set_title(f'{name}\nskewness={stats.skew(data):.3f}')
    ax.legend(fontsize=8)
plt.tight_layout()
plt.savefig('skewness.png', dpi=150)
plt.show()

Interpretation Guide

Absolute SkewnessInterpretation
less than 0.5Approximately symmetric
0.5–1.0Moderately skewed
greater than 1.0Highly skewed — consider transformation

Fixing Skewness with Transformations

skewed = np.random.lognormal(0, 1, 500)
print(f"Original skewness: {stats.skew(skewed):.4f}")

# Log transform (works for positive right-skewed data)
log_transformed = np.log(skewed)
print(f"Log-transformed skewness: {stats.skew(log_transformed):.4f}")

# Square root (moderate right skew)
sqrt_transformed = np.sqrt(skewed)
print(f"Sqrt-transformed skewness: {stats.skew(sqrt_transformed):.4f}")

Skewness in Machine Learning

ML ApplicationSkewness UsageWhy
Feature transformationLog/Box-Cox transform skewed featuresNormal distributions work better
Loss function designSkewed targets → asymmetric lossWeight false positives differently
Data augmentationKnow which direction to augmentBalance training data
Model selectionSkewed data → robust modelsRandom Forest over Linear
import numpy as np
from scipy.stats import skew, boxcox
from sklearn.preprocessing import PowerTransformer

np.random.seed(42)

# Skewed feature → log transform
skewed_data = np.random.lognormal(3, 1, 1000)
print(f"Before transform: skewness = {skew(skewed_data):.3f}")

log_data = np.log(skewed_data)
print(f"After log transform: skewness = {skew(log_data):.3f}")

# Box-Cox transformation (automatic)
bc_data, lam = boxcox(skewed_data)
print(f"After Box-Cox (λ={lam:.2f}): skewness = {skew(bc_data):.3f}")

# PowerTransformer (sklearn)
pt = PowerTransformer(method='yeo-johnson')
pt_data = pt.fit_transform(skewed_data.reshape(-1,1)).flatten()
print(f"After Yeo-Johnson: skewness = {skew(pt_data):.3f}")

Key Takeaways

Positive skew = right tail — mean > median (tail pulls mean rightward)

|skew| greater than 1: strongly skewed — use non-parametric methods or transform

Log transformation corrects right skewness in income, prices, and reaction times

Always visualize — skewness alone doesn't tell you the full story

"Skewness is the data's way of telling you the mean is not the whole story."

Premium Content

Skewness — Measuring Asymmetry of Distributions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement