Advanced SQL

Correlated Subqueries

Write subqueries that reference the outer query for row-by-row processing and dynamic filtering.

Row-by-Row Evaluation — Subquery re-executes for every outer row
Dynamic Filtering — Results depend on the current row's values Correlated subqueries connect the inner and outer query for context-aware results.

What Makes a Subquery Correlated?

DfCorrelated Subquery

A subquery that references one or more columns from the outer query. Unlike regular subqueries (which execute once), correlated subqueries re-execute for each row of the outer query, making the inner result dependent on the current outer row.

-- Regular subquery: executes ONCE, result is static
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated subquery: executes for EACH ROW, result changes
SELECT *
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e1.department  -- references outer query
);

The key difference is the e1.department reference inside the subquery — this binds the inner query to each row of the outer query.

Basic Correlated Subquery Syntax

DfExecution Order

Correlated subqueries execute in a specific order: the database scans the outer query, then evaluates the inner subquery for each outer row. This is fundamentally different from uncorrelated subqueries, which run once before the outer query begins.

-- For each employee, find employees in the same department
SELECT
    e1.first_name,
    e1.last_name,
    e1.department
FROM employees e1
WHERE EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.department = e1.department
      AND e2.employee_id != e1.employee_id
);

-- For each product, find the most recent order
SELECT
    p.product_name,
    p.category,
    (
        SELECT MAX(order_date)
        FROM order_items oi
        JOIN orders o ON oi.order_id = o.order_id
        WHERE oi.product_id = p.product_id
    ) AS last_ordered
FROM products p;

Correlated subqueries execute once per outer row. With 10,000 employees, a correlated subquery runs 10,000 times. For large datasets, consider window functions or JOINs as alternatives.

Correlated Subqueries in WHERE

DfRow-by-Row Filtering

The subquery in the WHERE clause evaluates a condition that depends on the current outer row. Only rows where the subquery returns TRUE are included in the result.

-- Find employees who earn more than their department average
SELECT
    first_name,
    last_name,
    department,
    salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

-- Find orders where the amount exceeds the customer's average
SELECT
    o.order_id,
    o.customer_id,
    o.amount,
    o.order_date
FROM orders o
WHERE o.amount > (
    SELECT AVG(amount)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);

-- Find products that have never been ordered
SELECT
    p.product_name,
    p.category
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

Filtering Patterns Comparison

Pattern	Use Case	Example
`> (SELECT AVG...)`	Above average per group	Employees above dept avg
`NOT EXISTS`	Find missing relationships	Products never ordered
`IN (SELECT...)`	Match against a set	Orders in specific cities
`= (SELECT MAX...)`	Find extremes per group	Latest order per customer

Correlated Subqueries in SELECT

DfComputed Columns

A correlated subquery in the SELECT clause computes a value for each row based on related data. It acts like a calculated field that queries the database for each row.

-- Show each employee's salary alongside their department average
SELECT
    first_name,
    last_name,
    salary,
    department,
    (
        SELECT ROUND(AVG(salary), 2)
        FROM employees e2
        WHERE e2.department = e1.department
    ) AS dept_avg_salary,
    salary - (
        SELECT AVG(salary)
        FROM employees e2
        WHERE e2.department = e1.department
    ) AS diff_from_avg
FROM employees e1;

-- Show each customer's order count and total spending
SELECT
    c.customer_name,
    c.city,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
    (SELECT COALESCE(SUM(amount), 0) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
ORDER BY total_spent DESC;

Subqueries in SELECT run once per row. For better performance with large datasets, replace with window functions: SUM(amount) OVER (PARTITION BY customer_id).

Correlated Subqueries in UPDATE

DfUPDATE with Subquery

Use a correlated subquery in an UPDATE statement to set values based on related data from other tables. The subquery provides context-specific values for each row being updated.

-- Update each employee's bonus based on department performance
UPDATE employees e1
SET bonus = (
    SELECT CASE
        WHEN e1.salary > AVG(e2.salary) * 1.2 THEN 5000
        WHEN e1.salary > AVG(e2.salary) THEN 3000
        ELSE 1000
    END
    FROM employees e2
    WHERE e2.department = e1.department
);

-- Update customer tier based on their order history
UPDATE customers c
SET tier = (
    SELECT CASE
        WHEN COUNT(*) >= 50 THEN 'Platinum'
        WHEN COUNT(*) >= 20 THEN 'Gold'
        WHEN COUNT(*) >= 5 THEN 'Silver'
        ELSE 'Bronze'
    END
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Correlated Subqueries in DELETE

-- Delete employees who earn less than their department average
DELETE FROM employees e1
WHERE salary < (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

-- Delete inactive customers (no orders in the last year)
DELETE FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.order_date >= CURRENT_DATE - INTERVAL '1' YEAR
);

Always run a SELECT before DELETE to verify which rows will be affected. Correlated DELETEs can remove more data than intended if the subquery logic is incorrect.

EXISTS vs NOT EXISTS Patterns

DfEXISTS

Returns TRUE if the correlated subquery finds at least one matching row. Does not return data — only checks for existence. Efficient because it stops scanning after the first match.

-- Find departments that have at least one employee earning over 100k
SELECT
    d.dept_name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department = d.dept_name
      AND e.salary > 100000
);

-- Find customers who have never left a review
SELECT
    c.customer_name,
    c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM reviews r
    WHERE r.customer_id = c.customer_id
);

-- Find products that appear in every order (universal quantification)
-- "Find products where there is no order that does NOT contain this product"
SELECT
    p.product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE NOT EXISTS (
        SELECT 1
        FROM order_items oi
        WHERE oi.order_id = o.order_id
          AND oi.product_id = p.product_id
    )
);

Double-nested NOT EXISTS implements universal quantification ("for all"). It translates to "find items where no counterexample exists."

Performance Considerations

Correlated Subquery vs JOIN vs Window Function

-- Correlated subquery: runs per row
SELECT
    first_name,
    salary,
    (SELECT AVG(salary) FROM employees e2 WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;

-- JOIN alternative: often faster
SELECT
    e.first_name,
    e.salary,
    d.dept_avg
FROM employees e
JOIN (
    SELECT department, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY department
) d ON e.department = d.department;

-- Window function: usually fastest
SELECT
    first_name,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Performance Comparison Table

Method	Execution	Best For	Drawback
Correlated Subquery	Once per row	Simple existence checks	Slow on large data
JOIN with Derived Table	Once (pre-aggregated)	Complex aggregations	May duplicate rows
Window Function	Once (analytical)	Running totals, ranks	Not supported in all DBs
EXISTS/NOT EXISTS	Stops at first match	Existence checks	Limited to TRUE/FALSE

Most query optimizers convert simple correlated subqueries into joins automatically. Write the most readable version and let the optimizer handle execution strategy.

Practical Examples

Rank Within Group

-- Assign rank to employees within each department by salary
SELECT
    first_name,
    last_name,
    department,
    salary,
    (SELECT COUNT(*)
     FROM employees e2
     WHERE e2.department = e1.department
       AND e2.salary >= e1.salary
    ) AS salary_rank
FROM employees e1
ORDER BY department, salary_rank;

First and Last Per Group

-- Find each customer's first and most recent order
SELECT
    c.customer_name,
    (SELECT order_date FROM orders o
     WHERE o.customer_id = c.customer_id
     ORDER BY order_date ASC LIMIT 1) AS first_order,
    (SELECT order_date FROM orders o
     WHERE o.customer_id = c.customer_id
     ORDER BY order_date DESC LIMIT 1) AS last_order
FROM customers c;

Conditional Aggregation

-- Show each employee with a flag if they are the top earner in their dept
SELECT
    first_name,
    last_name,
    department,
    salary,
    CASE
        WHEN salary = (
            SELECT MAX(salary)
            FROM employees e2
            WHERE e2.department = e1.department
        ) THEN 'Top Earner'
        ELSE 'Standard'
    END AS status
FROM employees e1;

Summary

Key Takeaways

Correlated subqueries reference outer query columns and re-execute for each outer row.
Use EXISTS for existence checks — it stops at the first match and handles NULLs safely.
NOT EXISTS is the safest way to find "missing" rows — avoid NOT IN with nullable columns.
Correlated subqueries in SELECT act as computed columns — replace with window functions for performance.
For UPDATE and DELETE, correlated subqueries provide context-aware modifications per row.
Window functions and JOINs usually outperform correlated subqueries on large datasets.

Correlated Subqueries

Correlated Subqueries

What Makes a Subquery Correlated?

DfCorrelated Subquery

Basic Correlated Subquery Syntax

DfExecution Order

Correlated Subqueries in WHERE

DfRow-by-Row Filtering

Filtering Patterns Comparison

Correlated Subqueries in SELECT

DfComputed Columns

Correlated Subqueries in UPDATE

DfUPDATE with Subquery

Correlated Subqueries in DELETE

EXISTS vs NOT EXISTS Patterns

DfEXISTS

Performance Considerations

Correlated Subquery vs JOIN vs Window Function

Performance Comparison Table

Practical Examples

Rank Within Group

First and Last Per Group

Conditional Aggregation

Summary

Key Takeaways

Premium Content

Need Expert SQL Help?