CTE Deep Dive
Master advanced Common Table Expression patterns for recursive queries, hierarchical data traversal, and complex data transformations.
- Recursive CTEs β traverse parent-child hierarchies and graph-like structures
- Deduplication β keep only the latest record per group using window functions inside CTEs
- Pivoting β reshape rows into columns with conditional aggregation in CTEs
- Multiple CTEs β chain or branch CTEs to build readable, modular query pipelines Write queries that are both powerful and maintainable.
What Is a CTE?
DfCommon Table Expression (CTE)
A temporary named result set defined with the WITH clause that exists only for the duration of a single query. CTEs improve readability, enable recursion, and allow self-referencing queries that would otherwise require temporary tables or subqueries.
-- Basic CTE syntax
WITH region_sales AS (
SELECT
region,
SUM(amount) AS total_sales,
COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY region
)
SELECT
region,
total_sales,
order_count,
ROUND(total_sales / order_count, 2) AS avg_order_value
FROM region_sales
WHERE total_sales > 50000
ORDER BY total_sales DESC;
CTEs are evaluated every time they are referenced. If a CTE is used multiple times in the same query, consider whether a temporary table might be more efficient for repeated access.
Recursive CTEs
DfRecursive CTE
A CTE that references itself using UNION ALL to combine a base (anchor) query with a recursive member. Each iteration processes one level of recursion until no more rows are produced.
-- Organizational hierarchy traversal
WITH RECURSIVE org_chart AS (
-- Anchor: top-level managers
SELECT
id,
name,
manager_id,
1 AS level,
CAST(name AS VARCHAR(500)) AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: direct reports
SELECT
e.id,
e.name,
e.manager_id,
oc.level + 1,
CAST(oc.path || ' > ' || e.name AS VARCHAR(500))
FROM employees e
INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
id,
name,
level,
path,
REPEAT(' ', level - 1) || name AS indented_name
FROM org_chart
ORDER BY path;
Recursive CTE β Category Tree
-- Product category hierarchy
WITH RECURSIVE category_tree AS (
SELECT
id,
name,
parent_id,
0 AS depth,
name AS full_path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
ct.depth + 1,
ct.full_path || ' > ' || c.name
FROM categories c
INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
depth,
REPEAT(' ', depth) || name AS display_name,
full_path
FROM category_tree
ORDER BY full_path;
Always set a recursion limit (e.g., MAXRECURSION in SQL Server or max_recursive_depth in PostgreSQL) to prevent infinite loops caused by circular references in your data.
Deduplication with CTEs
DfDeduplication
The process of removing duplicate rows from a result set, typically by ranking rows within a partition and keeping only the top-ranked entry.
-- Keep only the most recent order per customer
WITH ranked_orders AS (
SELECT
customer_id,
order_id,
order_date,
total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC, order_id DESC
) AS rn
FROM orders
)
SELECT
ro.customer_id,
c.first_name,
c.last_name,
ro.order_id,
ro.order_date,
ro.total
FROM ranked_orders ro
INNER JOIN customers c ON c.id = ro.customer_id
WHERE ro.rn = 1
ORDER BY ro.order_date DESC;
-- Remove duplicate email addresses, keeping the earliest signup
WITH deduped AS (
SELECT
id,
email,
name,
created_at,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at ASC
) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (
SELECT id FROM deduped WHERE rn > 1
);
Use ROW_NUMBER() when you want exactly one row per group. Use RANK() or DENSE_RANK() if ties are meaningful and you may want multiple rows.
Pivoting with CTEs
DfPivot
A transformation that converts rows into columns. SQL databases vary in syntax β PostgreSQL uses PIVOT via extensions or SUM(CASE ...), while SQL Server has a native PIVOT operator.
-- Monthly sales pivot using CTE + CASE
WITH monthly_data AS (
SELECT
product_id,
EXTRACT(MONTH FROM order_date) AS month,
SUM(quantity) AS total_qty
FROM order_items
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY product_id, EXTRACT(MONTH FROM order_date)
)
SELECT
p.product_name,
SUM(CASE WHEN md.month = 1 THEN md.total_qty ELSE 0 END) AS jan,
SUM(CASE WHEN md.month = 2 THEN md.total_qty ELSE 0 END) AS feb,
SUM(CASE WHEN md.month = 3 THEN md.total_qty ELSE 0 END) AS mar,
SUM(CASE WHEN md.month = 4 THEN md.total_qty ELSE 0 END) AS apr,
SUM(CASE WHEN md.month = 5 THEN md.total_qty ELSE 0 END) AS may,
SUM(CASE WHEN md.month = 6 THEN md.total_qty ELSE 0 END) AS jun,
SUM(md.total_qty) AS ytd_total
FROM monthly_data md
INNER JOIN products p ON p.id = md.product_id
GROUP BY p.product_name
ORDER BY ytd_total DESC;
Multiple CTEs β Building Modular Queries
-- Three-stage pipeline: extract, transform, aggregate
WITH raw_orders AS (
-- Extract: pull raw data
SELECT
o.id AS order_id,
o.customer_id,
o.order_date,
o.total,
c.segment
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
),
cleaned_orders AS (
-- Transform: filter and enrich
SELECT
order_id,
customer_id,
order_date,
CASE
WHEN total < 0 THEN 0
ELSE total
END AS total,
segment,
CASE
WHEN segment = 'Enterprise' THEN 'High Value'
WHEN segment = 'Business' THEN 'Medium Value'
ELSE 'Standard'
END AS tier
FROM raw_orders
WHERE total IS NOT NULL
)
-- Aggregate: final metrics
SELECT
tier,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(*) AS total_orders,
SUM(total) AS revenue,
ROUND(AVG(total), 2) AS avg_order_value,
ROUND(SUM(total) / COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM cleaned_orders
GROUP BY tier
ORDER BY revenue DESC;
CTE Comparison Table
| CTE Pattern | Use Case | Key Benefit |
|---|---|---|
| Simple CTE | Isolate subqueries for clarity | Named result set, no temp table needed |
| Recursive CTE | Hierarchies, graphs, sequences | Self-referencing without procedural code |
| Deduplication CTE | Keep latest/best row per group | Clean data for downstream queries |
| Pivoting CTE | Reshape rows to columns | Cross-tab reports in pure SQL |
| Chained CTEs | Multi-stage pipelines | Readable ETL-like transformations |
| CTE + Window Func | Running totals, rankings | Combine named sets with analytics |
Performance Considerations
| Consideration | Detail |
|---|---|
| Materialization | Some engines materialize CTEs (SQL Server pre-2022), others inline them (PostgreSQL). Check your engine. |
| Multiple References | A CTE referenced twice may be computed twice. Consider temporary tables for repeated use. |
| Indexing | CTEs don't have indexes. Ensure underlying tables are properly indexed. |
| Recursion Depth | Set explicit limits to prevent stack overflows on cyclic data. |
| CTE Inlining (SQL Server) | SQL Server 2022+ can inline simple CTEs automatically for better performance. |
PostgreSQL inlines CTEs by default unless you add MATERIALIZED. This means a CTE referenced twice may still scan the base table twice. Use WITH cte AS MATERIALIZED (...) when you need the result set cached.
Common Pitfalls
-- WRONG: Recursive CTE without termination condition
WITH RECURSIVE bad_cte AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM bad_cte -- infinite loop!
)
SELECT * FROM bad_cte; -- ERROR or hang
-- RIGHT: Add a WHERE clause to stop recursion
WITH RECURSIVE good_cte AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM good_cte WHERE n < 100
)
SELECT * FROM good_cte;
-- WRONG: Using CTE when a window function is simpler
WITH ranked AS (
SELECT
id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn = 1;
-- BETTER: Directly use window function (CTE adds no value here)
SELECT *
FROM (
SELECT
id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) sub
WHERE rn = 1;
Recursive CTEs cannot use GROUP BY, HAVING, LIMIT, or aggregate functions in the recursive member in most databases. Use a wrapping query to apply these after the recursion completes.
Real-World Example: Bill of Materials
-- Calculate total cost of a product with sub-components
WITH RECURSIVE bom AS (
-- Anchor: top-level product
SELECT
component_id,
component_name,
quantity,
unit_cost,
quantity * unit_cost AS line_cost,
1 AS level
FROM bill_of_materials
WHERE parent_product_id = 'PROD-001'
UNION ALL
-- Recursive: sub-components
SELECT
b.component_id,
b.component_name,
b.quantity,
b.unit_cost,
b.quantity * b.unit_cost * bom.quantity AS line_cost,
bom.level + 1
FROM bill_of_materials b
INNER JOIN bom ON b.parent_product_id = bom.component_id
)
SELECT
component_id,
component_name,
SUM(line_cost) AS total_cost,
MAX(level) AS max_depth
FROM bom
GROUP BY component_id, component_name
ORDER BY total_cost DESC;
Summary
| Concept | Key Takeaway |
|---|---|
| CTE Basics | Named temporary result set that improves readability |
| Recursion | Traverse hierarchies with anchor + recursive members |
| Deduplication | Use ROW_NUMBER() to keep one row per partition |
| Pivoting | Reshape data with CASE + aggregation inside CTEs |
| Chaining | Build multi-stage pipelines by referencing prior CTEs |
| Performance | Understand your engine's CTE materialization behavior |
Key Takeaways
- CTEs make complex queries readable by breaking them into named, logical steps
- Recursive CTEs enable hierarchy traversal without procedural loops
- Always set recursion limits to prevent infinite loops from circular data
- Use ROW_NUMBER() inside CTEs for reliable deduplication and ranking
- Consider temporary tables when a CTE is referenced multiple times for performance