Visualizing Categories: Bar Charts vs Pie Charts
Data Visualization
Choose the Right Chart Every Time
Bar charts and pie charts are the workhorses of categorical data visualization. Used correctly, they communicate insights instantly. Used incorrectly, they mislead. Understanding when to use each is fundamental to clear data storytelling.
Key things this concept helps with:
- Comparing quantities — When you need to see which categories are larger or smaller
- Showing composition — When you want to display how parts make up a whole
- Avoiding misrepresentation — When you need to create honest, readable visualizations
The right chart choice can make the difference between clarity and confusion.
What is Categorical Data Visualization?
Definition
These are the workhorses of categorical data visualization. Used correctly, they communicate insights instantly. Used incorrectly, they mislead.
Bar Charts
Definition
A bar chart displays the frequency or proportion of categories using bars of proportional length. The height (or length) of each bar is proportional to the value it represents.
Best for:
- Comparing quantities across categories
- Showing change over discrete time periods
- Ranking items
Key Properties:
- Bars have equal width with gaps between them
- Y-axis typically starts at zero
- Categories can be sorted for easier comparison
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sales data by product category
sales = pd.DataFrame({
'Category': ['Electronics', 'Clothing', 'Food', 'Furniture', 'Sports'],
'Revenue_M': [4.2, 2.8, 3.1, 1.9, 2.3],
'Units_K': [85, 320, 450, 42, 180]
})
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# 1. Simple vertical bar chart
bars = axes[0].bar(sales['Category'], sales['Revenue_M'],
color=['steelblue', 'coral', 'mediumseagreen', 'orchid', 'orange'],
edgecolor='black', alpha=0.8)
axes[0].set_title('Revenue by Category')
axes[0].set_ylabel('Revenue ($M)')
axes[0].tick_params(axis='x', rotation=30)
# Add value labels on bars
for bar in bars:
axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05,
f'${bar.get_height():.1f}M', ha='center', fontsize=9)
# 2. Horizontal bar chart (better for long labels)
sales_sorted = sales.sort_values('Revenue_M', ascending=True)
axes[1].barh(sales_sorted['Category'], sales_sorted['Revenue_M'],
color='steelblue', edgecolor='black', alpha=0.8)
axes[1].set_title('Revenue by Category\n(Sorted — better for comparison)')
axes[1].set_xlabel('Revenue ($M)')
# 3. Grouped bar chart: multiple metrics
x = np.arange(len(sales))
w = 0.35
axes[2].bar(x - w/2, sales['Revenue_M'], w, label='Revenue ($M)', color='steelblue', alpha=0.8)
axes2b = axes[2].twinx()
axes2b.bar(x + w/2, sales['Units_K']/100, w, label='Units (100K)', color='coral', alpha=0.8)
axes[2].set_title('Grouped: Revenue vs Volume')
axes[2].set_xticks(x)
axes[2].set_xticklabels(sales['Category'], rotation=30)
axes[2].legend(loc='upper left')
axes2b.legend(loc='upper right')
plt.tight_layout()
plt.savefig('bar_charts.png', dpi=150)
plt.show()
Pie Charts
Definition
A pie chart shows the proportional composition of a whole. Each slice represents a category's proportion of the total.
Best for:
- Part-to-whole relationships
- A small number of categories (≤ 5)
- When relative proportions are the main message
Worst for:
- Comparing values (bars are far better)
- Many categories (becomes unreadable)
- When precision matters
Mathematical Properties:
Pie Chart Slice Angle
Here,
- =Frequency or value of category i
- =Total of all categories
- =Total degrees in a circle
market_share = pd.DataFrame({
'Company': ['Alpha', 'Beta', 'Gamma', 'Delta', 'Others'],
'Share': [35, 28, 18, 12, 7]
})
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Standard pie chart
colors = ['#2196F3', '#F44336', '#4CAF50', '#FF9800', '#9E9E9E']
wedges, texts, autotexts = axes[0].pie(
market_share['Share'],
labels=market_share['Company'],
autopct='%1.1f%%',
colors=colors,
startangle=90,
pctdistance=0.85
)
axes[0].set_title('Market Share (Pie Chart)')
# Donut chart (modern alternative)
wedges2, _, _ = axes[1].pie(
market_share['Share'],
labels=None,
autopct='%1.1f%%',
colors=colors,
startangle=90,
pctdistance=0.85,
wedgeprops=dict(width=0.5) # creates donut hole
)
axes[1].set_title('Market Share (Donut Chart)')
axes[1].legend(market_share['Company'], title='Company',
loc='center left', bbox_to_anchor=(0.85, 0, 0.5, 1))
plt.tight_layout()
plt.savefig('pie_charts.png', dpi=150)
plt.show()
Bar Chart vs Pie Chart — Decision Guide
How many categories?
+-- ≤ 5 categories AND you want to show part-of-whole
| +-- Pie chart (or donut) <- acceptable
|
+-- Otherwise -> Bar chart (almost always better)
+-- Comparing values -> Vertical or horizontal bar
+-- Many categories -> Horizontal bar (labels fit)
+-- Change over time -> Line chart (not bar)
+-- Multiple series -> Grouped or stacked bar
Common Mistakes
| Mistake | Why It's Wrong | Fix |
|---|---|---|
| 3D bar/pie charts | Distorts areas, makes comparison impossible | Use flat 2D |
| Not starting y-axis at 0 (bar chart) | Makes small differences look huge | Start at 0 |
| Too many pie slices | Impossible to compare | Use bar chart or combine small slices into "Other" |
| No labels/legend | Reader can't interpret | Always label |
| Pie chart for greater than 5 categories | Angles look similar | Use bar chart |
Bar & Pie Charts in Machine Learning
In ML, bar charts are everywhere:
| ML Application | Chart Type | What to Show |
|---|---|---|
| Class imbalance | Bar chart | Frequency of each class |
| Feature importance | Horizontal bar | Top features by importance |
| Model comparison | Grouped bar | Accuracy, F1, AUC across models |
| Confusion matrix | Heatmap/bar | TP, TN, FP, FN counts |
| Hyperparameter tuning | Bar chart | Performance across settings |
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
random_state=42)
feature_names = [f'Feature_{i}' for i in range(10)]
# Train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Bar chart: feature importance
importance = model.feature_importances_
sorted_idx = np.argsort(importance)
plt.figure(figsize=(8, 5))
plt.barh(np.array(feature_names)[sorted_idx], importance[sorted_idx], color='steelblue')
plt.xlabel('Importance')
plt.title('Feature Importance (Random Forest)')
plt.tight_layout()
plt.show()
# Bar chart: class distribution
from collections import Counter
print("Class distribution:", Counter(y))
Key Takeaways
Bar charts are almost always better than pie charts for comparisons
Pie charts only work for 2–5 categories where part-to-whole is the message
Sort bar charts from longest to shortest for easier comparison
Never use 3D charts — they distort perception and add no information
When in doubt, choose a bar chart. It's the safer, clearer option for almost every categorical comparison.