Correlated Subqueries
Write subqueries that reference the outer query for row-by-row processing and dynamic filtering.
- Row-by-Row Evaluation β Subquery re-executes for every outer row
- Dynamic Filtering β Results depend on the current row's values Correlated subqueries connect the inner and outer query for context-aware results.
What Makes a Subquery Correlated?
DfCorrelated Subquery
A subquery that references one or more columns from the outer query. Unlike regular subqueries (which execute once), correlated subqueries re-execute for each row of the outer query, making the inner result dependent on the current outer row.
-- Regular subquery: executes ONCE, result is static
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Correlated subquery: executes for EACH ROW, result changes
SELECT *
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department -- references outer query
);
The key difference is the e1.department reference inside the subquery β this binds the inner query to each row of the outer query.
Basic Correlated Subquery Syntax
DfExecution Order
Correlated subqueries execute in a specific order: the database scans the outer query, then evaluates the inner subquery for each outer row. This is fundamentally different from uncorrelated subqueries, which run once before the outer query begins.
-- For each employee, find employees in the same department
SELECT
e1.first_name,
e1.last_name,
e1.department
FROM employees e1
WHERE EXISTS (
SELECT 1
FROM employees e2
WHERE e2.department = e1.department
AND e2.employee_id != e1.employee_id
);
-- For each product, find the most recent order
SELECT
p.product_name,
p.category,
(
SELECT MAX(order_date)
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
WHERE oi.product_id = p.product_id
) AS last_ordered
FROM products p;
Correlated Subqueries in WHERE
DfRow-by-Row Filtering
The subquery in the WHERE clause evaluates a condition that depends on the current outer row. Only rows where the subquery returns TRUE are included in the result.
-- Find employees who earn more than their department average
SELECT
first_name,
last_name,
department,
salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department
);
-- Find orders where the amount exceeds the customer's average
SELECT
o.order_id,
o.customer_id,
o.amount,
o.order_date
FROM orders o
WHERE o.amount > (
SELECT AVG(amount)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);
-- Find products that have never been ordered
SELECT
p.product_name,
p.category
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
Filtering Patterns Comparison
| Pattern | Use Case | Example |
|---|---|---|
> (SELECT AVG...) | Above average per group | Employees above dept avg |
NOT EXISTS | Find missing relationships | Products never ordered |
IN (SELECT...) | Match against a set | Orders in specific cities |
= (SELECT MAX...) | Find extremes per group | Latest order per customer |
Correlated Subqueries in SELECT
DfComputed Columns
A correlated subquery in the SELECT clause computes a value for each row based on related data. It acts like a calculated field that queries the database for each row.
-- Show each employee's salary alongside their department average
SELECT
first_name,
last_name,
salary,
department,
(
SELECT ROUND(AVG(salary), 2)
FROM employees e2
WHERE e2.department = e1.department
) AS dept_avg_salary,
salary - (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department
) AS diff_from_avg
FROM employees e1;
-- Show each customer's order count and total spending
SELECT
c.customer_name,
c.city,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
(SELECT COALESCE(SUM(amount), 0) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c
ORDER BY total_spent DESC;
SUM(amount) OVER (PARTITION BY customer_id).Correlated Subqueries in UPDATE
DfUPDATE with Subquery
Use a correlated subquery in an UPDATE statement to set values based on related data from other tables. The subquery provides context-specific values for each row being updated.
-- Update each employee's bonus based on department performance
UPDATE employees e1
SET bonus = (
SELECT CASE
WHEN e1.salary > AVG(e2.salary) * 1.2 THEN 5000
WHEN e1.salary > AVG(e2.salary) THEN 3000
ELSE 1000
END
FROM employees e2
WHERE e2.department = e1.department
);
-- Update customer tier based on their order history
UPDATE customers c
SET tier = (
SELECT CASE
WHEN COUNT(*) >= 50 THEN 'Platinum'
WHEN COUNT(*) >= 20 THEN 'Gold'
WHEN COUNT(*) >= 5 THEN 'Silver'
ELSE 'Bronze'
END
FROM orders o
WHERE o.customer_id = c.customer_id
);
Correlated Subqueries in DELETE
-- Delete employees who earn less than their department average
DELETE FROM employees e1
WHERE salary < (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department
);
-- Delete inactive customers (no orders in the last year)
DELETE FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= CURRENT_DATE - INTERVAL '1' YEAR
);
EXISTS vs NOT EXISTS Patterns
DfEXISTS
Returns TRUE if the correlated subquery finds at least one matching row. Does not return data β only checks for existence. Efficient because it stops scanning after the first match.
-- Find departments that have at least one employee earning over 100k
SELECT
d.dept_name
FROM departments d
WHERE EXISTS (
SELECT 1
FROM employees e
WHERE e.department = d.dept_name
AND e.salary > 100000
);
-- Find customers who have never left a review
SELECT
c.customer_name,
c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM reviews r
WHERE r.customer_id = c.customer_id
);
-- Find products that appear in every order (universal quantification)
-- "Find products where there is no order that does NOT contain this product"
SELECT
p.product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.order_id = o.order_id
AND oi.product_id = p.product_id
)
);
NOT EXISTS implements universal quantification ("for all"). It translates to "find items where no counterexample exists."Performance Considerations
Correlated Subquery vs JOIN vs Window Function
-- Correlated subquery: runs per row
SELECT
first_name,
salary,
(SELECT AVG(salary) FROM employees e2 WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;
-- JOIN alternative: often faster
SELECT
e.first_name,
e.salary,
d.dept_avg
FROM employees e
JOIN (
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
) d ON e.department = d.department;
-- Window function: usually fastest
SELECT
first_name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
Performance Comparison Table
| Method | Execution | Best For | Drawback |
|---|---|---|---|
| Correlated Subquery | Once per row | Simple existence checks | Slow on large data |
| JOIN with Derived Table | Once (pre-aggregated) | Complex aggregations | May duplicate rows |
| Window Function | Once (analytical) | Running totals, ranks | Not supported in all DBs |
| EXISTS/NOT EXISTS | Stops at first match | Existence checks | Limited to TRUE/FALSE |
Practical Examples
Rank Within Group
-- Assign rank to employees within each department by salary
SELECT
first_name,
last_name,
department,
salary,
(SELECT COUNT(*)
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary >= e1.salary
) AS salary_rank
FROM employees e1
ORDER BY department, salary_rank;
First and Last Per Group
-- Find each customer's first and most recent order
SELECT
c.customer_name,
(SELECT order_date FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY order_date ASC LIMIT 1) AS first_order,
(SELECT order_date FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY order_date DESC LIMIT 1) AS last_order
FROM customers c;
Conditional Aggregation
-- Show each employee with a flag if they are the top earner in their dept
SELECT
first_name,
last_name,
department,
salary,
CASE
WHEN salary = (
SELECT MAX(salary)
FROM employees e2
WHERE e2.department = e1.department
) THEN 'Top Earner'
ELSE 'Standard'
END AS status
FROM employees e1;
Summary
Key Takeaways
- Correlated subqueries reference outer query columns and re-execute for each outer row.
- Use
EXISTSfor existence checks β it stops at the first match and handles NULLs safely. NOT EXISTSis the safest way to find "missing" rows β avoidNOT INwith nullable columns.- Correlated subqueries in SELECT act as computed columns β replace with window functions for performance.
- For UPDATE and DELETE, correlated subqueries provide context-aware modifications per row.
- Window functions and JOINs usually outperform correlated subqueries on large datasets.