🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Mode — Most Frequent Value, Multimodality, Limitations

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Finding the Most Common Value in Any Dataset

Descriptive Statistics

The Mode: Where Data Clusters Most

Not every dataset tells its story through averages. Sometimes the most meaningful number is simply the one that shows up the most.

  • Categorical data — the only measure of center that works for colors, brands, and categories
  • Multimodality detection — revealing hidden subpopulations that averages mask entirely
  • Real-world pattern recognition — from shoe sizes to survey responses, the mode finds the crowd

The mode doesn't care about magnitude — it cares about frequency, and sometimes that's exactly what you need.


What is the Mode?

Definition

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, it is the only measure of central tendency that works for nominal (categorical) data.


Calculating the Mode

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Discrete data: shoe sizes
shoe_sizes = [8, 9, 8, 10, 8, 9, 10, 9, 8, 11, 9, 8, 10, 9, 8]
mode_result = stats.mode(shoe_sizes, keepdims=True)
print(f"Shoe sizes: {sorted(shoe_sizes)}")
print(f"Mode = {mode_result.mode[0]} (appears {mode_result.count[0]} times)")

# Using pandas
series = pd.Series(shoe_sizes)
print(f"Pandas mode: {series.mode().tolist()}")

# Frequency table
freq_table = series.value_counts().sort_index()
print(freq_table)

Bimodal and Multimodal Distributions

A distribution can have:

  • Unimodal: one peak
  • Bimodal: two peaks (often indicating two subpopulations)
  • Multimodal: multiple peaks
np.random.seed(42)

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Unimodal
data_uni = np.random.normal(50, 10, 1000)
axes[0].hist(data_uni, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
axes[0].set_title('Unimodal Distribution')
axes[0].axvline(np.mean(data_uni), color='red', linestyle='--', label='Mean ≈ Mode ≈ Median')
axes[0].legend(fontsize=8)

# Bimodal (two populations)
data_bi = np.concatenate([np.random.normal(35, 7, 500),
                           np.random.normal(70, 7, 500)])
axes[1].hist(data_bi, bins=40, color='coral', edgecolor='black', alpha=0.7)
axes[1].set_title('Bimodal Distribution\n(Two groups mixed!)')
mean_bi = np.mean(data_bi)
axes[1].axvline(mean_bi, color='red', linestyle='--', label=f'Mean={mean_bi:.0f} (misleading!)')
axes[1].legend(fontsize=8)

# Mode for categorical data
categories = ['Python', 'R', 'SQL', 'Python', 'Python', 'R', 'SQL', 'Python', 'R', 'Python']
cat_counts = pd.Series(categories).value_counts()
cat_counts.plot(kind='bar', ax=axes[2], color=['steelblue','coral','green'], edgecolor='black')
axes[2].set_title('Mode for Categorical Data\n(Mode = Python)')
axes[2].tick_params(rotation=0)

plt.tight_layout()
plt.savefig('mode_examples.png', dpi=150)
plt.show()

Mode for Continuous Data

DfMode for Continuous Data

Continuous data rarely repeats exactly. Use a histogram or kernel density estimate (KDE) to find the modal class or the density peak.

from scipy import stats as scipy_stats

continuous = np.random.gamma(3, 2, 1000)

# KDE to find the mode
kde = scipy_stats.gaussian_kde(continuous)
x = np.linspace(continuous.min(), continuous.max(), 1000)
mode_estimate = x[np.argmax(kde(x))]

print(f"Continuous data mode (KDE peak): {mode_estimate:.3f}")
print(f"Theoretical mode of Gamma(3,2): {(3-1)*2:.3f}")  # (α-1)β

plt.figure(figsize=(8, 4))
plt.hist(continuous, bins=40, density=True, color='lightblue', edgecolor='gray', alpha=0.7)
plt.plot(x, kde(x), 'r-', linewidth=2, label='KDE')
plt.axvline(mode_estimate, color='green', linewidth=2, linestyle='--',
            label=f'KDE Mode ≈ {mode_estimate:.2f}')
plt.legend()
plt.title('Mode Estimation for Continuous Data')
plt.show()

When to Use the Mode

Data TypeMode Appropriate?
Nominal (color, country, brand)✅ Only valid measure of center
Ordinal (satisfaction scale)✅ Valid
Interval/Ratio (symmetric)Sometimes (= mean ≈ median)
Interval/Ratio (skewed)Less useful
Bimodal data✅ Essential to report both modes

Mode in Machine Learning

Most FrequentMode imputationNLP TokensMost common wordsClassificationMajority class baselineBimodal DetectHidden subgroupsMode is critical for categorical ML: tokenization, class imbalance, and baseline models

In ML, the mode is essential for categorical data:

ML ApplicationHow Mode is UsedWhy It Matters
Mode imputationFill missing categoricals with most frequentSimple, fast, preserves distribution
Majority class baselinePredict mode = worst-case classifierAll other models must beat this
NLP tokenizationStop words = most frequent tokensRemove them to improve signal
Bimodal detectionTwo modes = two subpopulationsSplit data, train separate models
Feature selectionHigh-cardinality features (many modes)May need encoding strategies
import numpy as np
from collections import Counter
import pandas as pd

# Mode imputation for categorical features
data = pd.DataFrame({
    'color': ['red', 'blue', 'red', None, 'blue', 'red', 'green', 'red', None],
    'size': [10, 20, 15, 20, 25, 20, 15, 20, 20]
})

print("Original data:")
print(data)

# Mode imputation
mode_color = data['color'].mode()[0]
print(f"\nMode of 'color': {mode_color}")
data['color_filled'] = data['color'].fillna(mode_color)
print(data[['color', 'color_filled']])

# Majority class baseline in classification
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                       test_size=0.3, random_state=42)

mode_class = Counter(y_train).most_common(1)[0]
print(f"\nMost frequent class: {mode_class[0]} (appears {mode_class[1]} times)")

y_baseline = np.full(len(y_test), mode_class[0])
baseline_acc = accuracy_score(y_test, y_baseline)
print(f"Baseline accuracy (always predict mode): {baseline_acc:.3f}")
print("Any real model must beat this!")

Key Takeaways

The mode is the only valid measure of central tendency for nominal (categorical) data — mean and median simply don't apply.

Bimodal distributions signal two hidden subpopulations — never report just one mode when two exist.

For continuous data, use KDE (kernel density estimation) to find the mode as the density peak.

The mean of a bimodal distribution can fall between the two modes — a value nobody actually has.

The mode reminds us that the "center" of data isn't always about balancing extremes — sometimes it's about finding where the crowd actually stands.

Premium Content

Mode — Most Frequent Value, Multimodality, Limitations

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement