πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Query Performance

SQL PerformanceOptimization🟒 Free Lesson

Advertisement

SQL Performance

Query Performance

Speed up your slow queries with execution plans, indexing strategies, and proven optimization techniques.

  • Execution Plans β€” Understand exactly how the database processes each query
  • Indexing Strategies β€” Create the right indexes for your workload
  • Query Rewriting β€” Transform slow queries into fast ones with simple techniques Performance is not an afterthought β€” it's a design decision.

Why Query Performance Matters

DfQuery Performance

Query performance measures how quickly a database retrieves or modifies data. It's influenced by schema design, indexing, query structure, data volume, and hardware resources. Poor performance leads to slow applications, frustrated users, and increased infrastructure costs.

Query Optimization Flow1. IdentifySlow query logEXPLAIN ANALYZE2. AnalyzeExecution planSeq scan? Sort?3. OptimizeAdd indexRewrite query4. MeasureBefore vs AfterProve improvement5. ShipDeployCommon Anti-Patterns to FixSELECT *Func on IndexNOT IN SubqImplicit JOIN

A query that takes 100ms with 1,000 rows might take 10+ seconds with 1,000,000 rows if not properly optimized. Performance tuning is about making queries scale.

Performance FactorImpactDifficulty to Fix
Missing indexesVery HighLow
Poor query structureHighMedium
Bad schema designHighHigh
Insufficient hardwareMediumLow (just upgrade)
Lock contentionHighHigh
Outdated statisticsMediumLow

The Performance Tuning Process

-- Step 1: Identify slow queries
-- PostgreSQL: Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries > 1 second
SELECT pg_reload_conf();

-- Step 2: Measure baseline performance
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 10;

-- Step 3: Create appropriate index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Step 4: Re-measure and compare
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 10;
Always measure before and after optimization. Without baseline numbers, you can't prove your changes actually improved performance.

Indexing Strategies

How Indexes Work

DfIndex

An index is a data structure that provides fast lookup of rows matching specific column values, similar to a book's index. Without an index, the database must scan every row (full table scan) to find matching records.

-- Without index: Full table scan (checks every row)
-- With index: Index scan (follows B-tree structure)

-- B-tree index (default, most common)
CREATE INDEX idx_customers_email ON customers(email);
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Speed: O(log n) instead of O(n)

-- Composite index (multiple columns)
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
SELECT * FROM orders WHERE status = 'shipped' AND order_date > '2024-01-01';
-- Uses both columns in the index

-- Partial index (index only subset of rows)
CREATE INDEX idx_orders_pending ON orders(order_date)
WHERE status = 'pending';
-- Much smaller index, faster for pending order queries

-- Covering index (includes all needed columns)
CREATE INDEX idx_orders_covering ON orders(customer_id, order_date)
INCLUDE (total, status);
-- Query answered entirely from index, no table access needed

Index Types Comparison

Index TypeBest ForPostgreSQL Syntax
B-treeEquality, range queriesCREATE INDEX ... ON table(col)
HashEquality only (fastest)CREATE INDEX ... ON table USING hash(col)
GINFull-text search, arraysCREATE INDEX ... ON table USING gin(col)
GiSTGeometric,ε…¨ζ–‡ζœη΄’CREATE INDEX ... ON table USING gist(col)
BRINLarge, naturally ordered tablesCREATE INDEX ... ON table USING brin(col)
PartialFiltered queriesCREATE INDEX ... ON table(col) WHERE condition

When NOT to Create Indexes

Indexes are not free. Every index slows down INSERT, UPDATE, and DELETE operations because the index must be maintained. Don't index columns that are rarely queried.
Skip Indexing OnReason
Small tables (< 10K rows)Full scan is fast enough
Low-cardinality columns (status, boolean)Index not selective enough
Columns rarely used in WHERE/JOINMaintenance cost outweighs benefit
Frequently updated columnsIndex maintenance slows writes
Wide text columns without prefixIndex size becomes unwieldy

Common Query Anti-Patterns

Anti-Pattern 1: SELECT *

-- BAD: Fetches all columns, even those not needed
SELECT * FROM customers
INNER JOIN orders ON customers.id = orders.customer_id
WHERE orders.total > 100;

-- GOOD: Only fetch what you need
SELECT
    c.name,
    c.email,
    o.order_date,
    o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 100;

Anti-Pattern 2: Functions on Indexed Columns

-- BAD: Function prevents index usage (full table scan)
SELECT * FROM orders
WHERE YEAR(order_date) = 2024;

-- GOOD: Rewrite to allow index usage
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01';

-- BAD: Lowercase comparison prevents index usage
SELECT * FROM customers
WHERE LOWER(email) = 'alice@example.com';

-- GOOD: Create functional index
CREATE INDEX idx_customers_email_lower ON customers(LOWER(email));
SELECT * FROM customers
WHERE LOWER(email) = 'alice@example.com';

Anti-Pattern 3: NOT IN with Subquery

-- BAD: NOT IN with subquery (slow, NULL-handling issues)
SELECT * FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);

-- GOOD: Use NOT EXISTS (better performance)
SELECT * FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id
);

-- GOOD: Use LEFT JOIN with NULL check
SELECT p.*
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.product_id IS NULL;

Anti-Pattern 4: Implicit JOIN

-- BAD: Implicit join (old syntax, hard to read)
SELECT c.name, o.total
FROM customers c, orders o
WHERE c.id = o.customer_id
AND o.total > 100;

-- GOOD: Explicit JOIN (clear intent)
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 100;

JOIN Optimization

DfJOIN Performance

JOIN performance depends on the join algorithm used: Nested Loop (small tables), Hash Join (medium tables), or Merge Join (pre-sorted data). The database optimizer chooses the algorithm, but you can influence it through indexes and statistics.

-- Ensure JOIN columns are indexed
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_order_items_order ON order_items(order_id);

-- Use appropriate JOIN type
-- BAD: CROSS JOIN then filter (produces all combinations first)
SELECT c.name, p.name
FROM customers c, products p, orders o
WHERE c.id = o.customer_id AND o.product_id = p.product_id;

-- GOOD: Explicit INNER JOIN
SELECT c.name, p.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN products p ON o.product_id = p.product_id;

-- Filter early to reduce JOIN dataset
SELECT c.name, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'  -- Filter before or after?
-- Filter in subquery for complex scenarios
SELECT c.name, agg.total_spent
FROM customers c
INNER JOIN (
    SELECT customer_id, SUM(total) AS total_spent
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
) agg ON c.id = agg.customer_id;
For large JOINs, consider materialized views or temporary tables to pre-compute intermediate results. This trades storage for speed.

Subquery vs JOIN Performance

ApproachWhen It's FasterWhen It's Slower
JOINNeed data from both tablesDuplicate rows from 1:N joins
EXISTSJust checking existenceWhen full data needed
INSmall subquery result setLarge subquery result set
LATERAL JOINPer-row subquery computationCorrelated subqueries
CTEReadability, complex logicSimple queries (CTE overhead)
-- Subquery vs JOIN: Find customers who placed orders in 2024

-- Method 1: Subquery with IN (good for small result sets)
SELECT * FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders
    WHERE order_date >= '2024-01-01'
);

-- Method 2: JOIN (good when you need order data too)
SELECT DISTINCT c.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';

-- Method 3: EXISTS (best for existence check)
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2024-01-01'
);

Pagination Performance

-- BAD: OFFSET-based pagination (slow for large offsets)
SELECT * FROM products
ORDER BY product_id
LIMIT 20 OFFSET 10000;  -- Database reads 10020 rows, discards 10000

-- GOOD: Keyset pagination (constant time)
SELECT * FROM products
WHERE product_id > 10000  -- Last seen ID from previous page
ORDER BY product_id
LIMIT 20;

-- BAD: Using LIMIT without ORDER BY (non-deterministic)
SELECT * FROM orders LIMIT 10;

-- GOOD: Deterministic ordering
SELECT * FROM orders
ORDER BY order_id
LIMIT 10;
OFFSET-based pagination becomes exponentially slower as the offset increases. For tables with millions of rows, always use keyset pagination.

Performance Benchmarks

-- Benchmarking with pgbench-style queries
-- Test 1: Simple primary key lookup (should be < 1ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM customers WHERE customer_id = 12345;

-- Test 2: Indexed range query (should be < 10ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- Test 3: Aggregation with GROUP BY (should be < 100ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

-- Test 4: Complex JOIN (should be < 500ms)
EXPLAIN (ANALYZE, BUFFERS)
SELECT
    c.name,
    c.email,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at >= '2023-01-01'
GROUP BY c.customer_id, c.name, c.email
HAVING COUNT(o.order_id) > 5
ORDER BY lifetime_value DESC
LIMIT 20;
Query TypeTarget TimeOptimization Priority
PK lookup< 1msIndex exists?
Indexed range< 10msComposite index?
Aggregation< 100msGROUP BY indexed?
Complex JOIN< 500msJoin indexes?
Full table scanAvoidAdd index
Set performance budgets for your critical queries. If a query exceeds its budget, it becomes a bug that must be fixed before deployment.

Key Takeaways

  1. Measure before optimizing β€” Use EXPLAIN ANALYZE to get actual execution times
  2. Index strategically β€” Index WHERE, JOIN, and ORDER BY columns, not everything
  3. **Avoid SELECT *** β€” Fetch only the columns you need
  4. Rewrite anti-patterns β€” Functions on columns, NOT IN, OFFSET pagination
  5. Test with realistic data β€” 100 rows and 10M rows behave very differently

Performance Quick Reference

ProblemSymptomSolution
Missing indexSequential scan on large tableCREATE INDEX on filtered/joined columns
Wrong index typeIndex not usedCheck column usage patterns
Lock timeoutQueries blocking each otherReduce transaction duration
Memory pressureDisk-based sorts/hashesIncrease work_mem, add indexes
Statistics staleBad query plansANALYZE table
N+1 queriesMany small queries in loopUse JOIN or batch loading
⭐

Premium Content

Query Performance

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert SQL Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement