Stem-and-Leaf Plots

Data Visualization

Preserve Your Data While Seeing the Shape

A stem-and-leaf plot organizes numerical data while retaining the actual values — unlike a histogram which loses individual data points. It is the best of both worlds: shape visibility plus raw data preservation.

Retain original values — Every data point stays visible in the display
Small dataset champion — Ideal when n is less than 100 and individual values matter
Back-to-back comparison — Compare two distributions side by side effortlessly
Read percentiles directly — Count to the middle value to find the median instantly

When your dataset is small enough that every value matters, the stem-and-leaf plot is your best friend.

What is a Stem-and-Leaf Plot?

Definition

A stem-and-leaf plot (or stemplot) is a data display that organizes numerical data while retaining the actual values — unlike a histogram which loses individual data points.

Best for: Small datasets (n less than 100), when you want to preserve original values, two-group comparisons.

Construction

Rule: Split each data value into a stem (leading digit(s)) and leaf (last digit).

Example: Exam scores: 72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66

Exam Scores — Stem-and-Leaf Plot

Stem	Leaf
6	5 6 8	(65, 66, 68)
7	1 2 3 7 9	(71, 72, 73, 77, 79)
8	2 4 5 8	(82, 84, 85, 88)
9	1 3	(91, 93)

Key: 7|2 = 72

def stem_and_leaf(data, leaf_unit=1):
    """
    Create a stem-and-leaf display for a list of numbers.
    leaf_unit: the place value of the leaf (1 for ones, 10 for tens, etc.)
    """
    from collections import defaultdict
    
    stems = defaultdict(list)
    for val in sorted(data):
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems[stem].append(leaf)
    
    print(f"Stem-and-Leaf Plot (leaf unit = {leaf_unit})")
    print(f"{'Stem':>5} | Leaves")
    print(f"{'-----':>5}---{'-------'}")
    for stem in sorted(stems.keys()):
        leaves = ' '.join(str(l) for l in sorted(stems[stem]))
        print(f"{stem:>5} | {leaves}")
    print(f"\nKey: stem|leaf = each unit of {leaf_unit * 10}")

scores = [72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66]
stem_and_leaf(scores)

# You can read off the median directly!
import numpy as np
print(f"\nMedian = {np.median(scores)}")
print(f"Min = {min(scores)}, Max = {max(scores)}")

Back-to-Back Stem-and-Leaf

DfBack-to-Back Stem-and-Leaf

Compare two groups side by side. Leaves for Group A go left, Group B go right.

def back_to_back_stem(data_a, label_a, data_b, label_b, leaf_unit=1):
    """Back-to-back stem-and-leaf plot."""
    from collections import defaultdict
    
    stems_a = defaultdict(list)
    stems_b = defaultdict(list)
    
    for val in data_a:
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems_a[stem].append(leaf)
    for val in data_b:
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems_b[stem].append(leaf)
    
    all_stems = sorted(set(stems_a.keys()) | set(stems_b.keys()))
    
    print(f"Back-to-Back Stem-and-Leaf: {label_a} vs {label_b}")
    print(f"{'':>15}  Stem  {'':}")
    print(f"{label_a:>15}  |    | {label_b}")
    print("-" * 40)
    
    for stem in all_stems:
        left = ' '.join(str(l) for l in sorted(stems_a[stem], reverse=True))
        right = ' '.join(str(l) for l in sorted(stems_b[stem]))
        print(f"{left:>15}  | {stem:2d} | {right}")

import numpy as np
np.random.seed(0)
class_a = sorted(np.random.normal(74, 8, 20).clip(50, 100).round(0).astype(int))
class_b = sorted(np.random.normal(81, 7, 20).clip(50, 100).round(0).astype(int))

back_to_back_stem(class_a, "Class A", class_b, "Class B")

print(f"\nClass A: median={np.median(class_a):.1f}, mean={np.mean(class_a):.1f}")
print(f"Class B: median={np.median(class_b):.1f}, mean={np.mean(class_b):.1f}")

Stem-and-Leaf vs Histogram

Feature	Stem-and-Leaf	Histogram
Retains raw values	✅ Yes	❌ No
Works for large n	❌ Unwieldy	✅ Yes
Side-by-side comparison	✅ Back-to-back	✅ Overlaid
Shape visible	✅ Yes	✅ Yes
Median/quartiles readable	✅ Yes	❌ Not directly

Stem-and-Leaf in Machine Learning

While histograms are more common in ML, stem-and-leaf plots have their place:

ML Context	Why Stem-and-Leaf Works
Small validation set (n < 50)	See every prediction value
A/B test results	Compare two groups precisely
Model comparison	Side-by-side error metrics
Hyperparameter search	Compare small result sets

import numpy as np

# Comparing two models' prediction errors on a small test set
np.random.seed(42)
n_test = 25

model_a_errors = np.random.normal(0, 3, n_test).round(1)
model_b_errors = np.random.normal(0.5, 4, n_test).round(1)

def simple_stem(data, label):
    """Quick stem display for small data."""
    stems = {}
    for val in data:
        stem = int(val // 10)
        leaf = int(abs(val) % 10)
        if stem not in stems:
            stems[stem] = []
        stems[stem].append(leaf)
    print(f"\n{label}:")
    for s in sorted(stems.keys()):
        leaves = ' '.join(str(l) for l in sorted(stems[s]))
        print(f"  {s:3d} | {leaves}")

simple_stem(model_a_errors, "Model A errors")
simple_stem(model_b_errors, "Model B errors")

print(f"\nModel A: mean={model_a_errors.mean():.2f}, std={model_a_errors.std():.2f}")
print(f"Model B: mean={model_b_errors.mean():.2f}, std={model_b_errors.std():.2f}")

Key Takeaways

Summary: Stem-and-Leaf Plots

Stem-and-leaf plots preserve original data values — unlike histograms
They're best for small datasets (n less than 100) where individual values matter
Back-to-back stemplots are excellent for comparing two distributions
You can read the median directly by counting to the middle value
Shape is visible — you can see skewness, gaps, and outliers immediately
For large datasets, use histograms or KDE plots instead

Stem-and-Leaf Plots — Construction and Interpretation