Finding the Most Common Value in Any Dataset

Descriptive Statistics

The Mode: Where Data Clusters Most

Not every dataset tells its story through averages. Sometimes the most meaningful number is simply the one that shows up the most.

Categorical data — the only measure of center that works for colors, brands, and categories
Multimodality detection — revealing hidden subpopulations that averages mask entirely
Real-world pattern recognition — from shoe sizes to survey responses, the mode finds the crowd

The mode doesn't care about magnitude — it cares about frequency, and sometimes that's exactly what you need.

What is the Mode?

Definition

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, it is the only measure of central tendency that works for nominal (categorical) data.

Calculating the Mode

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Discrete data: shoe sizes
shoe_sizes = [8, 9, 8, 10, 8, 9, 10, 9, 8, 11, 9, 8, 10, 9, 8]
mode_result = stats.mode(shoe_sizes, keepdims=True)
print(f"Shoe sizes: {sorted(shoe_sizes)}")
print(f"Mode = {mode_result.mode[0]} (appears {mode_result.count[0]} times)")

# Using pandas
series = pd.Series(shoe_sizes)
print(f"Pandas mode: {series.mode().tolist()}")

# Frequency table
freq_table = series.value_counts().sort_index()
print(freq_table)

Bimodal and Multimodal Distributions

A distribution can have:

Unimodal: one peak
Bimodal: two peaks (often indicating two subpopulations)
Multimodal: multiple peaks

np.random.seed(42)

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Unimodal
data_uni = np.random.normal(50, 10, 1000)
axes[0].hist(data_uni, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
axes[0].set_title('Unimodal Distribution')
axes[0].axvline(np.mean(data_uni), color='red', linestyle='--', label='Mean ≈ Mode ≈ Median')
axes[0].legend(fontsize=8)

# Bimodal (two populations)
data_bi = np.concatenate([np.random.normal(35, 7, 500),
                           np.random.normal(70, 7, 500)])
axes[1].hist(data_bi, bins=40, color='coral', edgecolor='black', alpha=0.7)
axes[1].set_title('Bimodal Distribution\n(Two groups mixed!)')
mean_bi = np.mean(data_bi)
axes[1].axvline(mean_bi, color='red', linestyle='--', label=f'Mean={mean_bi:.0f} (misleading!)')
axes[1].legend(fontsize=8)

# Mode for categorical data
categories = ['Python', 'R', 'SQL', 'Python', 'Python', 'R', 'SQL', 'Python', 'R', 'Python']
cat_counts = pd.Series(categories).value_counts()
cat_counts.plot(kind='bar', ax=axes[2], color=['steelblue','coral','green'], edgecolor='black')
axes[2].set_title('Mode for Categorical Data\n(Mode = Python)')
axes[2].tick_params(rotation=0)

plt.tight_layout()
plt.savefig('mode_examples.png', dpi=150)
plt.show()

Mode for Continuous Data

DfMode for Continuous Data

Continuous data rarely repeats exactly. Use a histogram or kernel density estimate (KDE) to find the modal class or the density peak.

from scipy import stats as scipy_stats

continuous = np.random.gamma(3, 2, 1000)

# KDE to find the mode
kde = scipy_stats.gaussian_kde(continuous)
x = np.linspace(continuous.min(), continuous.max(), 1000)
mode_estimate = x[np.argmax(kde(x))]

print(f"Continuous data mode (KDE peak): {mode_estimate:.3f}")
print(f"Theoretical mode of Gamma(3,2): {(3-1)*2:.3f}")  # (α-1)β

plt.figure(figsize=(8, 4))
plt.hist(continuous, bins=40, density=True, color='lightblue', edgecolor='gray', alpha=0.7)
plt.plot(x, kde(x), 'r-', linewidth=2, label='KDE')
plt.axvline(mode_estimate, color='green', linewidth=2, linestyle='--',
            label=f'KDE Mode ≈ {mode_estimate:.2f}')
plt.legend()
plt.title('Mode Estimation for Continuous Data')
plt.show()

When to Use the Mode

Data Type	Mode Appropriate?
Nominal (color, country, brand)	✅ Only valid measure of center
Ordinal (satisfaction scale)	✅ Valid
Interval/Ratio (symmetric)	Sometimes (= mean ≈ median)
Interval/Ratio (skewed)	Less useful
Bimodal data	✅ Essential to report both modes

Mode in Machine Learning

In ML, the mode is essential for categorical data:

ML Application	How Mode is Used	Why It Matters
Mode imputation	Fill missing categoricals with most frequent	Simple, fast, preserves distribution
Majority class baseline	Predict mode = worst-case classifier	All other models must beat this
NLP tokenization	Stop words = most frequent tokens	Remove them to improve signal
Bimodal detection	Two modes = two subpopulations	Split data, train separate models
Feature selection	High-cardinality features (many modes)	May need encoding strategies

import numpy as np
from collections import Counter
import pandas as pd

# Mode imputation for categorical features
data = pd.DataFrame({
    'color': ['red', 'blue', 'red', None, 'blue', 'red', 'green', 'red', None],
    'size': [10, 20, 15, 20, 25, 20, 15, 20, 20]
})

print("Original data:")
print(data)

# Mode imputation
mode_color = data['color'].mode()[0]
print(f"\nMode of 'color': {mode_color}")
data['color_filled'] = data['color'].fillna(mode_color)
print(data[['color', 'color_filled']])

# Majority class baseline in classification
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                       test_size=0.3, random_state=42)

mode_class = Counter(y_train).most_common(1)[0]
print(f"\nMost frequent class: {mode_class[0]} (appears {mode_class[1]} times)")

y_baseline = np.full(len(y_test), mode_class[0])
baseline_acc = accuracy_score(y_test, y_baseline)
print(f"Baseline accuracy (always predict mode): {baseline_acc:.3f}")
print("Any real model must beat this!")

Key Takeaways

The mode is the only valid measure of central tendency for nominal (categorical) data — mean and median simply don't apply.

Bimodal distributions signal two hidden subpopulations — never report just one mode when two exist.

For continuous data, use KDE (kernel density estimation) to find the mode as the density peak.

The mean of a bimodal distribution can fall between the two modes — a value nobody actually has.

The mode reminds us that the "center" of data isn't always about balancing extremes — sometimes it's about finding where the crowd actually stands.

Mode — Most Frequent Value, Multimodality, Limitations