Table Partitioning
Divide massive tables into smaller, manageable partitions for faster queries and easier maintenance.
- Range Partitioning β split data by date ranges or numeric intervals
- List Partitioning β assign rows to partitions based on discrete values
- Hash Partitioning β distribute data evenly across partitions using a hash function
- Partition Pruning β let the engine skip irrelevant partitions at query time Scale to billions of rows without rewriting application queries.
What Is Table Partitioning?
DfTable Partitioning
A database design technique that splits one logical table into multiple physical storage units called partitions. Each partition holds a subset of the data and can be managed independently. The database engine transparently routes queries to the correct partitions through a process called partition pruning.
-- Range partitioning by year (PostgreSQL)
CREATE TABLE sales (
id BIGSERIAL,
sale_date DATE NOT NULL,
customer_id INT NOT NULL,
amount DECIMAL(10,2),
region VARCHAR(50)
) PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2022 PARTITION OF sales
FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Partition pruning happens automatically when your WHERE clause filters on the partition key. A query with WHERE sale_date = '2024-06-15' will only scan the sales_2024 partition.
Types of Partitioning
| Strategy | How It Works | Best For |
|---|---|---|
| Range | Splits by continuous values (dates, numbers) | Time-series data, log tables |
| List | Assigns rows to partitions by explicit value lists | Multi-tenant systems, regional data |
| Hash | Distributes rows by hashing a column | Even distribution, no natural range key |
| Composite | Combines two or more strategies | Large tables needing both range and list |
Range Partitioning β Deep Dive
DfRange Partitioning
Maps each row to a partition based on whether the partition key falls within a defined range. Each partition has lower and upper bounds. Rows outside all defined ranges are rejected or routed to a default partition.
-- Monthly range partitioning (PostgreSQL)
CREATE TABLE events (
event_id BIGSERIAL,
event_time TIMESTAMP NOT NULL,
user_id INT NOT NULL,
event_type VARCHAR(50),
payload JSONB
) PARTITION BY RANGE (event_time);
-- Create partitions for each month
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE events_2024_03 PARTITION OF events
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... repeat for each month
-- Default partition catches anything that doesn't match
CREATE TABLE events_default PARTITION OF events DEFAULT;
-- MySQL range partitioning syntax
CREATE TABLE metrics (
id BIGINT AUTO_INCREMENT,
recorded_at DATETIME NOT NULL,
metric_name VARCHAR(100),
metric_value DOUBLE,
PRIMARY KEY (id, recorded_at)
)
PARTITION BY RANGE (YEAR(recorded_at)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
In MySQL, the partition key must be part of the primary key. If you don't include it, you'll get: InnoDB does not support partitioning in place of the PRIMARY KEY.
List Partitioning
DfList Partitioning
Assigns rows to partitions based on a discrete set of values. Each partition is defined by an explicit list of allowed values for the partition key.
-- List partitioning by region
CREATE TABLE customer_data (
id BIGSERIAL,
customer_name VARCHAR(100),
region VARCHAR(20) NOT NULL,
revenue DECIMAL(12,2)
) PARTITION BY LIST (region);
CREATE TABLE customer_na PARTITION OF customer_data
FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE customer_eu PARTITION OF customer_data
FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT');
CREATE TABLE customer_apac PARTITION OF customer_data
FOR VALUES IN ('JP', 'AU', 'KR', 'IN', 'SG');
-- Multi-column list partitioning (PostgreSQL)
CREATE TABLE orders (
id BIGSERIAL,
order_date DATE NOT NULL,
status VARCHAR(20) NOT NULL,
warehouse VARCHAR(10) NOT NULL,
total DECIMAL(10,2)
) PARTITION BY LIST (status);
CREATE TABLE orders_pending PARTITION OF orders
FOR VALUES IN ('PENDING', 'PROCESSING');
CREATE TABLE orders_shipped PARTITION OF orders
FOR VALUES IN ('SHIPPED', 'DELIVERED');
CREATE TABLE orders_cancelled PARTITION OF orders
FOR VALUES IN ('CANCELLED', 'REFUNDED');
Hash Partitioning
DfHash Partitioning
Distributes rows across a fixed number of partitions by applying a hash function to the partition key. This ensures roughly even distribution regardless of the data's natural distribution.
-- Hash partitioning into 8 partitions (PostgreSQL)
CREATE TABLE sessions (
session_id UUID NOT NULL,
user_id INT NOT NULL,
started_at TIMESTAMP,
last_active TIMESTAMP,
ip_address INET
) PARTITION BY HASH (session_id);
CREATE TABLE sessions_0 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 3);
CREATE TABLE sessions_4 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 4);
CREATE TABLE sessions_5 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 5);
CREATE TABLE sessions_6 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 6);
CREATE TABLE sessions_7 PARTITION OF sessions
FOR VALUES WITH (MODULUS 8, REMAINDER 7);
Partition Pruning
DfPartition Pruning
The optimizer's ability to eliminate partitions that cannot contain matching rows, based on conditions in the WHERE clause. This reduces I/O by scanning only relevant partitions.
-- This query prunes to only sales_2024
EXPLAIN ANALYZE
SELECT region, SUM(amount) AS total_sales
FROM sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region;
-- This query prunes across two partitions
EXPLAIN ANALYZE
SELECT *
FROM sales
WHERE sale_date >= '2023-06-01' AND sale_date < '2024-06-01';
-- This query CANNOT prune (function on partition key)
EXPLAIN ANALYZE
SELECT * FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2024; -- scans all partitions!
For best pruning performance, always filter directly on the partition column. Avoid wrapping it in functions like EXTRACT(), DATE_TRUNC(), or CAST() in the WHERE clause.
Managing Partitions
-- Add a new partition for the next month
CREATE TABLE sales_2025_01 PARTITION OF sales
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
-- Detach an old partition (PostgreSQL 12+)
ALTER TABLE sales DETACH PARTITION sales_2022;
-- Drop a detached partition
DROP TABLE sales_2022;
-- Attach an existing table as a partition
ALTER TABLE sales ATTACH PARTITION sales_2022
FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
-- Check partition sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'sales_%'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
-- MySQL: Add and drop partitions
ALTER TABLE metrics ADD PARTITION (
PARTITION p2025 VALUES LESS THAN (2026)
);
ALTER TABLE metrics DROP PARTITION p2022;
-- Reorganize a partition (MySQL)
ALTER TABLE metrics REORGANIZE PARTITION p2024 INTO (
PARTITION p2024_h1 VALUES LESS THAN (2024),
PARTITION p2024_h2 VALUES LESS THAN (2025)
);
Partitioning vs Indexing
| Feature | Partitioning | Indexing |
|---|---|---|
| Purpose | Split table into physical pieces | Create sorted lookup structures |
| Granularity | Row groups (thousands to millions) | Individual rows |
| Pruning | Eliminates entire partitions | Seeks into B-tree or hash index |
| Maintenance | Drop/detach partitions for archival | Rebuild indexes periodically |
| Best For | Large time-series, multi-tenant | Frequent lookups on specific columns |
| Combined | Partition + index each partition | Best of both worlds |
You can (and should) create indexes within each partition. Most databases automatically create partition-local indexes when you define them on the parent table.
When to Partition
| Scenario | Partition? | Why |
|---|---|---|
| Table over 100GB | Yes | Pruning eliminates massive I/O |
| Time-series with archival | Yes | Drop old partitions instantly |
| Multi-tenant with data isolation | Yes | Per-tenant partition management |
| Table under 10GB | No | Overhead exceeds benefits |
| Frequent cross-partition joins | Caution | May hurt performance |
| Small lookup tables | No | Indexes are sufficient |
Common Pitfalls
-- WRONG: Using a function on the partition key (prevents pruning)
SELECT * FROM sales WHERE YEAR(sale_date) = 2024;
-- RIGHT: Filter on the raw partition key
SELECT * FROM sales
WHERE sale_date >= '2024-01-01' AND sale_date < '2025-01-01';
-- WRONG: Forgetting the default partition
-- Rows outside defined ranges will be rejected
INSERT INTO sales (sale_date, customer_id, amount, region)
VALUES ('2026-01-15', 1, 99.99, 'US');
-- ERROR: no partition of relation "sales" found for row
-- RIGHT: Always include a default partition
CREATE TABLE sales_default PARTITION OF sales DEFAULT;
Real-World: Automated Partition Management
-- Procedure to auto-create next month's partition (PostgreSQL)
CREATE OR REPLACE PROCEDURE create_next_partition(
p_table_name TEXT,
p_key_column TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
v_next_start DATE := DATE_TRUNC('month', CURRENT_DATE + INTERVAL '1 month');
v_next_end DATE := v_next_start + INTERVAL '1 month';
v_part_name TEXT := p_table_name || '_' || TO_CHAR(v_next_start, 'YYYY_MM');
BEGIN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
v_part_name, p_table_name, v_next_start, v_next_end
);
RAISE NOTICE 'Created partition: %', v_part_name;
END;
$$;
-- Call monthly via pg_cron
SELECT cron.schedule('create-sales-partition', '0 0 1 * *',
$$CALL create_next_partition('sales', 'sale_date')$$
);
Summary
| Concept | Key Takeaway |
|---|---|
| Range | Best for time-series; partitions by value intervals |
| List | Best for discrete categories; partitions by explicit values |
| Hash | Best for even distribution; no natural range key needed |
| Pruning | Filter on partition key to skip irrelevant partitions |
| Management | Add, detach, and drop partitions independently |
| Indexing | Always index within partitions for best performance |
Key Takeaways
- Partition large tables by date or category to enable partition pruning and faster queries
- Use range partitioning for time-series data and list partitioning for multi-tenant systems
- Always filter directly on the partition key β functions prevent pruning
- Create indexes on each partition for the best combination of pruning and seeking
- Automate partition creation and archival for production systems