SQL Performance

Query Performance

Speed up your slow queries with execution plans, indexing strategies, and proven optimization techniques.

Execution Plans — Understand exactly how the database processes each query
Indexing Strategies — Create the right indexes for your workload
Query Rewriting — Transform slow queries into fast ones with simple techniques Performance is not an afterthought — it's a design decision.

Why Query Performance Matters

DfQuery Performance

Query performance measures how quickly a database retrieves or modifies data. It's influenced by schema design, indexing, query structure, data volume, and hardware resources. Poor performance leads to slow applications, frustrated users, and increased infrastructure costs.

A query that takes 100ms with 1,000 rows might take 10+ seconds with 1,000,000 rows if not properly optimized. Performance tuning is about making queries scale.

Performance Factor	Impact	Difficulty to Fix
Missing indexes	Very High	Low
Poor query structure	High	Medium
Bad schema design	High	High
Insufficient hardware	Medium	Low (just upgrade)
Lock contention	High	High
Outdated statistics	Medium	Low

The Performance Tuning Process

-- Step 1: Identify slow queries
-- PostgreSQL: Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries > 1 second
SELECT pg_reload_conf();

-- Step 2: Measure baseline performance
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 10;

-- Step 3: Create appropriate index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Step 4: Re-measure and compare
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 10;

Always measure before and after optimization. Without baseline numbers, you can't prove your changes actually improved performance.

Indexing Strategies

How Indexes Work

DfIndex

An index is a data structure that provides fast lookup of rows matching specific column values, similar to a book's index. Without an index, the database must scan every row (full table scan) to find matching records.

-- Without index: Full table scan (checks every row)
-- With index: Index scan (follows B-tree structure)

-- B-tree index (default, most common)
CREATE INDEX idx_customers_email ON customers(email);
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Speed: O(log n) instead of O(n)

-- Composite index (multiple columns)
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
SELECT * FROM orders WHERE status = 'shipped' AND order_date > '2024-01-01';
-- Uses both columns in the index

-- Partial index (index only subset of rows)
CREATE INDEX idx_orders_pending ON orders(order_date)
WHERE status = 'pending';
-- Much smaller index, faster for pending order queries

-- Covering index (includes all needed columns)
CREATE INDEX idx_orders_covering ON orders(customer_id, order_date)
INCLUDE (total, status);
-- Query answered entirely from index, no table access needed

Index Types Comparison

Index Type	Best For	PostgreSQL Syntax
B-tree	Equality, range queries	`CREATE INDEX ... ON table(col)`
Hash	Equality only (fastest)	`CREATE INDEX ... ON table USING hash(col)`
GIN	Full-text search, arrays	`CREATE INDEX ... ON table USING gin(col)`
GiST	Geometric,全文搜索	`CREATE INDEX ... ON table USING gist(col)`
BRIN	Large, naturally ordered tables	`CREATE INDEX ... ON table USING brin(col)`
Partial	Filtered queries	`CREATE INDEX ... ON table(col) WHERE condition`

When NOT to Create Indexes

Indexes are not free. Every index slows down INSERT, UPDATE, and DELETE operations because the index must be maintained. Don't index columns that are rarely queried.

Skip Indexing On	Reason
Small tables (< 10K rows)	Full scan is fast enough
Low-cardinality columns (status, boolean)	Index not selective enough
Columns rarely used in WHERE/JOIN	Maintenance cost outweighs benefit
Frequently updated columns	Index maintenance slows writes
Wide text columns without prefix	Index size becomes unwieldy

Common Query Anti-Patterns

Anti-Pattern 1: SELECT *

-- BAD: Fetches all columns, even those not needed
SELECT * FROM customers
INNER JOIN orders ON customers.id = orders.customer_id
WHERE orders.total > 100;

-- GOOD: Only fetch what you need
SELECT
    c.name,
    c.email,
    o.order_date,
    o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 100;

Anti-Pattern 2: Functions on Indexed Columns

-- BAD: Function prevents index usage (full table scan)
SELECT * FROM orders
WHERE YEAR(order_date) = 2024;

-- GOOD: Rewrite to allow index usage
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01';

-- BAD: Lowercase comparison prevents index usage
SELECT * FROM customers
WHERE LOWER(email) = 'alice@example.com';

-- GOOD: Create functional index
CREATE INDEX idx_customers_email_lower ON customers(LOWER(email));
SELECT * FROM customers
WHERE LOWER(email) = 'alice@example.com';

Anti-Pattern 3: NOT IN with Subquery

-- BAD: NOT IN with subquery (slow, NULL-handling issues)
SELECT * FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);

-- GOOD: Use NOT EXISTS (better performance)
SELECT * FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id
);

-- GOOD: Use LEFT JOIN with NULL check
SELECT p.*
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.product_id IS NULL;

Anti-Pattern 4: Implicit JOIN

-- BAD: Implicit join (old syntax, hard to read)
SELECT c.name, o.total
FROM customers c, orders o
WHERE c.id = o.customer_id
AND o.total > 100;

-- GOOD: Explicit JOIN (clear intent)
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 100;

JOIN Optimization

DfJOIN Performance

JOIN performance depends on the join algorithm used: Nested Loop (small tables), Hash Join (medium tables), or Merge Join (pre-sorted data). The database optimizer chooses the algorithm, but you can influence it through indexes and statistics.

-- Ensure JOIN columns are indexed
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_order_items_order ON order_items(order_id);

-- Use appropriate JOIN type
-- BAD: CROSS JOIN then filter (produces all combinations first)
SELECT c.name, p.name
FROM customers c, products p, orders o
WHERE c.id = o.customer_id AND o.product_id = p.product_id;

-- GOOD: Explicit INNER JOIN
SELECT c.name, p.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN products p ON o.product_id = p.product_id;

-- Filter early to reduce JOIN dataset
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'  -- Filter before or after?
-- Filter in subquery for complex scenarios
SELECT c.name, agg.total_spent
FROM customers c
INNER JOIN (
    SELECT customer_id, SUM(total) AS total_spent
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
) agg ON c.id = agg.customer_id;

For large JOINs, consider materialized views or temporary tables to pre-compute intermediate results. This trades storage for speed.

Subquery vs JOIN Performance

Approach	When It's Faster	When It's Slower
JOIN	Need data from both tables	Duplicate rows from 1:N joins
EXISTS	Just checking existence	When full data needed
IN	Small subquery result set	Large subquery result set
LATERAL JOIN	Per-row subquery computation	Correlated subqueries
CTE	Readability, complex logic	Simple queries (CTE overhead)

-- Subquery vs JOIN: Find customers who placed orders in 2024

-- Method 1: Subquery with IN (good for small result sets)
SELECT * FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders
    WHERE order_date >= '2024-01-01'
);

-- Method 2: JOIN (good when you need order data too)
SELECT DISTINCT c.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';

-- Method 3: EXISTS (best for existence check)
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2024-01-01'
);

Pagination Performance

-- BAD: OFFSET-based pagination (slow for large offsets)
SELECT * FROM products
ORDER BY product_id
LIMIT 20 OFFSET 10000;  -- Database reads 10020 rows, discards 10000

-- GOOD: Keyset pagination (constant time)
SELECT * FROM products
WHERE product_id > 10000  -- Last seen ID from previous page
ORDER BY product_id
LIMIT 20;

-- BAD: Using LIMIT without ORDER BY (non-deterministic)
SELECT * FROM orders LIMIT 10;

-- GOOD: Deterministic ordering
SELECT * FROM orders
ORDER BY order_id
LIMIT 10;

OFFSET-based pagination becomes exponentially slower as the offset increases. For tables with millions of rows, always use keyset pagination.

Performance Benchmarks

-- Benchmarking with pgbench-style queries
-- Test 1: Simple primary key lookup (should be < 1ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM customers WHERE customer_id = 12345;

-- Test 2: Indexed range query (should be < 10ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- Test 3: Aggregation with GROUP BY (should be < 100ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

-- Test 4: Complex JOIN (should be < 500ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT
    c.name,
    c.email,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at >= '2023-01-01'
GROUP BY c.customer_id, c.name, c.email
HAVING COUNT(o.order_id) > 5
ORDER BY lifetime_value DESC
LIMIT 20;

Query Type	Target Time	Optimization Priority
PK lookup	< 1ms	Index exists?
Indexed range	< 10ms	Composite index?
Aggregation	< 100ms	GROUP BY indexed?
Complex JOIN	< 500ms	Join indexes?
Full table scan	Avoid	Add index

Set performance budgets for your critical queries. If a query exceeds its budget, it becomes a bug that must be fixed before deployment.

Key Takeaways

Measure before optimizing — Use EXPLAIN ANALYZE to get actual execution times
Index strategically — Index WHERE, JOIN, and ORDER BY columns, not everything
**Avoid SELECT *** — Fetch only the columns you need
Rewrite anti-patterns — Functions on columns, NOT IN, OFFSET pagination
Test with realistic data — 100 rows and 10M rows behave very differently

Performance Quick Reference

Problem	Symptom	Solution
Missing index	Sequential scan on large table	CREATE INDEX on filtered/joined columns
Wrong index type	Index not used	Check column usage patterns
Lock timeout	Queries blocking each other	Reduce transaction duration
Memory pressure	Disk-based sorts/hashes	Increase work_mem, add indexes
Statistics stale	Bad query plans	ANALYZE table
N+1 queries	Many small queries in loop	Use JOIN or batch loading

Query Performance

Query Performance

Why Query Performance Matters

DfQuery Performance

The Performance Tuning Process

Indexing Strategies

How Indexes Work

DfIndex

Index Types Comparison

When NOT to Create Indexes

Common Query Anti-Patterns

Anti-Pattern 1: SELECT *

Anti-Pattern 2: Functions on Indexed Columns

Anti-Pattern 3: NOT IN with Subquery

Anti-Pattern 4: Implicit JOIN

JOIN Optimization

DfJOIN Performance

Subquery vs JOIN Performance

Pagination Performance

Performance Benchmarks

Key Takeaways

Performance Quick Reference

Premium Content

Need Expert SQL Help?