SQL Expert

CTE Deep Dive

Master advanced Common Table Expression patterns for recursive queries, hierarchical data traversal, and complex data transformations.

Recursive CTEs — traverse parent-child hierarchies and graph-like structures
Deduplication — keep only the latest record per group using window functions inside CTEs
Pivoting — reshape rows into columns with conditional aggregation in CTEs
Multiple CTEs — chain or branch CTEs to build readable, modular query pipelines Write queries that are both powerful and maintainable.

What Is a CTE?

DfCommon Table Expression (CTE)

A temporary named result set defined with the WITH clause that exists only for the duration of a single query. CTEs improve readability, enable recursion, and allow self-referencing queries that would otherwise require temporary tables or subqueries.

-- Basic CTE syntax
WITH region_sales AS (
    SELECT
        region,
        SUM(amount) AS total_sales,
        COUNT(*) AS order_count
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY region
)
SELECT
    region,
    total_sales,
    order_count,
    ROUND(total_sales / order_count, 2) AS avg_order_value
FROM region_sales
WHERE total_sales > 50000
ORDER BY total_sales DESC;

CTEs are evaluated every time they are referenced. If a CTE is used multiple times in the same query, consider whether a temporary table might be more efficient for repeated access.

Recursive CTEs

DfRecursive CTE

A CTE that references itself using UNION ALL to combine a base (anchor) query with a recursive member. Each iteration processes one level of recursion until no more rows are produced.

-- Organizational hierarchy traversal
WITH RECURSIVE org_chart AS (
    -- Anchor: top-level managers
    SELECT
        id,
        name,
        manager_id,
        1 AS level,
        CAST(name AS VARCHAR(500)) AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: direct reports
    SELECT
        e.id,
        e.name,
        e.manager_id,
        oc.level + 1,
        CAST(oc.path || ' > ' || e.name AS VARCHAR(500))
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
    id,
    name,
    level,
    path,
    REPEAT('  ', level - 1) || name AS indented_name
FROM org_chart
ORDER BY path;

Recursive CTE — Category Tree

-- Product category hierarchy
WITH RECURSIVE category_tree AS (
    SELECT
        id,
        name,
        parent_id,
        0 AS depth,
        name AS full_path
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT
        c.id,
        c.name,
        c.parent_id,
        ct.depth + 1,
        ct.full_path || ' > ' || c.name
    FROM categories c
    INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
    depth,
    REPEAT('  ', depth) || name AS display_name,
    full_path
FROM category_tree
ORDER BY full_path;

Always set a recursion limit (e.g., MAXRECURSION in SQL Server or max_recursive_depth in PostgreSQL) to prevent infinite loops caused by circular references in your data.

Deduplication with CTEs

DfDeduplication

The process of removing duplicate rows from a result set, typically by ranking rows within a partition and keeping only the top-ranked entry.

-- Keep only the most recent order per customer
WITH ranked_orders AS (
    SELECT
        customer_id,
        order_id,
        order_date,
        total,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id
            ORDER BY order_date DESC, order_id DESC
        ) AS rn
    FROM orders
)
SELECT
    ro.customer_id,
    c.first_name,
    c.last_name,
    ro.order_id,
    ro.order_date,
    ro.total
FROM ranked_orders ro
INNER JOIN customers c ON c.id = ro.customer_id
WHERE ro.rn = 1
ORDER BY ro.order_date DESC;

-- Remove duplicate email addresses, keeping the earliest signup
WITH deduped AS (
    SELECT
        id,
        email,
        name,
        created_at,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY created_at ASC
        ) AS rn
    FROM users
)
DELETE FROM users
WHERE id IN (
    SELECT id FROM deduped WHERE rn > 1
);

Use ROW_NUMBER() when you want exactly one row per group. Use RANK() or DENSE_RANK() if ties are meaningful and you may want multiple rows.

Pivoting with CTEs

DfPivot

A transformation that converts rows into columns. SQL databases vary in syntax — PostgreSQL uses PIVOT via extensions or SUM(CASE ...), while SQL Server has a native PIVOT operator.

-- Monthly sales pivot using CTE + CASE
WITH monthly_data AS (
    SELECT
        product_id,
        EXTRACT(MONTH FROM order_date) AS month,
        SUM(quantity) AS total_qty
    FROM order_items
    WHERE EXTRACT(YEAR FROM order_date) = 2024
    GROUP BY product_id, EXTRACT(MONTH FROM order_date)
)
SELECT
    p.product_name,
    SUM(CASE WHEN md.month = 1 THEN md.total_qty ELSE 0 END) AS jan,
    SUM(CASE WHEN md.month = 2 THEN md.total_qty ELSE 0 END) AS feb,
    SUM(CASE WHEN md.month = 3 THEN md.total_qty ELSE 0 END) AS mar,
    SUM(CASE WHEN md.month = 4 THEN md.total_qty ELSE 0 END) AS apr,
    SUM(CASE WHEN md.month = 5 THEN md.total_qty ELSE 0 END) AS may,
    SUM(CASE WHEN md.month = 6 THEN md.total_qty ELSE 0 END) AS jun,
    SUM(md.total_qty) AS ytd_total
FROM monthly_data md
INNER JOIN products p ON p.id = md.product_id
GROUP BY p.product_name
ORDER BY ytd_total DESC;

Multiple CTEs — Building Modular Queries

-- Three-stage pipeline: extract, transform, aggregate
WITH raw_orders AS (
    -- Extract: pull raw data
    SELECT
        o.id AS order_id,
        o.customer_id,
        o.order_date,
        o.total,
        c.segment
    FROM orders o
    INNER JOIN customers c ON c.id = o.customer_id
    WHERE o.order_date >= '2024-01-01'
),
cleaned_orders AS (
    -- Transform: filter and enrich
    SELECT
        order_id,
        customer_id,
        order_date,
        CASE
            WHEN total < 0 THEN 0
            ELSE total
        END AS total,
        segment,
        CASE
            WHEN segment = 'Enterprise' THEN 'High Value'
            WHEN segment = 'Business' THEN 'Medium Value'
            ELSE 'Standard'
        END AS tier
    FROM raw_orders
    WHERE total IS NOT NULL
)
-- Aggregate: final metrics
SELECT
    tier,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(*) AS total_orders,
    SUM(total) AS revenue,
    ROUND(AVG(total), 2) AS avg_order_value,
    ROUND(SUM(total) / COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM cleaned_orders
GROUP BY tier
ORDER BY revenue DESC;

CTE Comparison Table

CTE Pattern	Use Case	Key Benefit
Simple CTE	Isolate subqueries for clarity	Named result set, no temp table needed
Recursive CTE	Hierarchies, graphs, sequences	Self-referencing without procedural code
Deduplication CTE	Keep latest/best row per group	Clean data for downstream queries
Pivoting CTE	Reshape rows to columns	Cross-tab reports in pure SQL
Chained CTEs	Multi-stage pipelines	Readable ETL-like transformations
CTE + Window Func	Running totals, rankings	Combine named sets with analytics

Performance Considerations

Consideration	Detail
Materialization	Some engines materialize CTEs (SQL Server pre-2022), others inline them (PostgreSQL). Check your engine.
Multiple References	A CTE referenced twice may be computed twice. Consider temporary tables for repeated use.
Indexing	CTEs don't have indexes. Ensure underlying tables are properly indexed.
Recursion Depth	Set explicit limits to prevent stack overflows on cyclic data.
CTE Inlining (SQL Server)	SQL Server 2022+ can inline simple CTEs automatically for better performance.

PostgreSQL inlines CTEs by default unless you add MATERIALIZED. This means a CTE referenced twice may still scan the base table twice. Use WITH cte AS MATERIALIZED (...) when you need the result set cached.

Common Pitfalls

-- WRONG: Recursive CTE without termination condition
WITH RECURSIVE bad_cte AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM bad_cte  -- infinite loop!
)
SELECT * FROM bad_cte;  -- ERROR or hang

-- RIGHT: Add a WHERE clause to stop recursion
WITH RECURSIVE good_cte AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM good_cte WHERE n < 100
)
SELECT * FROM good_cte;

-- WRONG: Using CTE when a window function is simpler
WITH ranked AS (
    SELECT
        id,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT * FROM ranked WHERE rn = 1;

-- BETTER: Directly use window function (CTE adds no value here)
SELECT *
FROM (
    SELECT
        id,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) sub
WHERE rn = 1;

Recursive CTEs cannot use GROUP BY, HAVING, LIMIT, or aggregate functions in the recursive member in most databases. Use a wrapping query to apply these after the recursion completes.

Real-World Example: Bill of Materials

-- Calculate total cost of a product with sub-components
WITH RECURSIVE bom AS (
    -- Anchor: top-level product
    SELECT
        component_id,
        component_name,
        quantity,
        unit_cost,
        quantity * unit_cost AS line_cost,
        1 AS level
    FROM bill_of_materials
    WHERE parent_product_id = 'PROD-001'

    UNION ALL

    -- Recursive: sub-components
    SELECT
        b.component_id,
        b.component_name,
        b.quantity,
        b.unit_cost,
        b.quantity * b.unit_cost * bom.quantity AS line_cost,
        bom.level + 1
    FROM bill_of_materials b
    INNER JOIN bom ON b.parent_product_id = bom.component_id
)
SELECT
    component_id,
    component_name,
    SUM(line_cost) AS total_cost,
    MAX(level) AS max_depth
FROM bom
GROUP BY component_id, component_name
ORDER BY total_cost DESC;

Summary

Concept	Key Takeaway
CTE Basics	Named temporary result set that improves readability
Recursion	Traverse hierarchies with anchor + recursive members
Deduplication	Use ROW_NUMBER() to keep one row per partition
Pivoting	Reshape data with CASE + aggregation inside CTEs
Chaining	Build multi-stage pipelines by referencing prior CTEs
Performance	Understand your engine's CTE materialization behavior

Key Takeaways

CTEs make complex queries readable by breaking them into named, logical steps
Recursive CTEs enable hierarchy traversal without procedural loops
Always set recursion limits to prevent infinite loops from circular data
Use ROW_NUMBER() inside CTEs for reliable deduplication and ranking
Consider temporary tables when a CTE is referenced multiple times for performance

CTE Deep Dive

CTE Deep Dive

What Is a CTE?

DfCommon Table Expression (CTE)

Recursive CTEs

DfRecursive CTE

Recursive CTE — Category Tree

Deduplication with CTEs

DfDeduplication

Pivoting with CTEs

DfPivot

Multiple CTEs — Building Modular Queries

CTE Comparison Table

Performance Considerations

Common Pitfalls

Real-World Example: Bill of Materials

Summary

Key Takeaways

Premium Content

Need Expert SQL Help?