Subqueries
Nest one query inside another to perform complex data retrieval with scalar, row, table, and correlated subqueries.
- Scalar Subqueries β Return a single value for use in comparisons
- Table Subqueries β Generate derived tables for filtering and joining Think of subqueries as asking a question about the answer to another question.
What Is a Subquery?
DfSubquery
A subquery is a SELECT statement embedded inside another SQL statement, enclosed in parentheses. The outer query uses the subquery's result as a value, set of values, or temporary table.
-- Basic subquery structure
SELECT column1, column2
FROM table1
WHERE column3 IN (
SELECT column FROM table2 WHERE condition
);
Subqueries can appear in four locations:
- WHERE β Filter rows based on subquery results
- FROM β Create a derived table (subquery factoring)
- SELECT β Return a scalar value per row
- HAVING β Filter groups based on aggregated subquery results
Scalar Subqueries
DfScalar Subquery
Returns exactly one row and one column. Used anywhere a single value is expected β in SELECT, WHERE, or HAVING clauses.
-- Find employees who earn more than the company average
SELECT
first_name,
last_name,
salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
-- Show each employee's salary alongside the company average
SELECT
first_name,
last_name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
LIMIT 1 or aggregation to guarantee a single result.Row Subqueries
DfRow Subquery
Returns a single row with multiple columns. Used with comparisons like =, >, or < to match against multiple fields simultaneously.
-- Find employees who share the same department and salary as employee ID 101
SELECT
first_name,
last_name,
department,
salary
FROM employees
WHERE (department, salary) = (
SELECT department, salary
FROM employees
WHERE employee_id = 101
);
IN Subqueries
DfIN Subquery
Tests whether a value exists in the result set returned by a subquery. Returns TRUE if the value matches any row in the subquery.
-- Find all orders from customers in New York
SELECT
order_id,
order_date,
amount
FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM customers
WHERE city = 'New York'
);
-- Find products that have never been ordered
SELECT
product_name,
category
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
WHERE product_id IS NOT NULL
);
NOT IN, ensure the subquery contains no NULL values. A single NULL in the result set causes NOT IN to return zero rows. Filter NULLs explicitly or use NOT EXISTS instead.EXISTS Subqueries
DfEXISTS
Tests whether a subquery returns any rows. Returns TRUE if at least one row exists. Does not return data β only checks for existence.
-- Find customers who have placed at least one order
SELECT
customer_id,
customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find customers with no orders using NOT EXISTS
SELECT
customer_id,
customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
IN vs EXISTS Comparison
| Scenario | Use IN | Use EXISTS |
|---|---|---|
| Subquery returns small list | β Best | Works |
| Subquery returns large dataset | Works | β Best |
| Need to check for NULLs | β οΈ Risky | β Safe |
| Outer table is small | β Best | Works |
| Subquery is complex/JOINed | Works | β Best |
Derived Table Subqueries
DfDerived Table
A subquery in the FROM clause that produces a temporary result set. Must be aliased and can be joined like any regular table.
-- Use a derived table to compute department statistics first
SELECT
d.dept_name,
d_stats.avg_salary,
d_stats.headcount
FROM departments d
JOIN (
SELECT
department_id,
AVG(salary) AS avg_salary,
COUNT(*) AS headcount
FROM employees
GROUP BY department_id
) d_stats ON d.dept_id = d_stats.department_id;
-- Filter on aggregated results using a derived table
SELECT *
FROM (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
) customer_totals
WHERE total_spent > 1000
ORDER BY total_spent DESC;
Subqueries in SELECT
-- Correlated scalar subquery in SELECT clause
SELECT
e.first_name,
e.last_name,
e.salary,
e.department,
e.salary - (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department
) AS diff_from_dept_avg
FROM employees e;
-- Count related rows with a scalar subquery
SELECT
c.customer_id,
c.customer_name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
(SELECT SUM(amount) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c;
Subqueries in HAVING
-- Find departments with above-average headcount
SELECT
department,
COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > (
SELECT AVG(dept_count)
FROM (
SELECT COUNT(*) AS dept_count
FROM employees
GROUP BY department
) dept_sizes
);
-- Find product categories where average order value exceeds the overall average
SELECT
category,
ROUND(AVG(amount), 2) AS avg_order_value
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY category
HAVING AVG(amount) > (
SELECT AVG(amount) FROM order_items
);
Subquery vs JOIN Performance
-- Using subquery (may be slower on large tables)
SELECT *
FROM employees
WHERE department_id IN (
SELECT department_id FROM departments WHERE location = 'NYC'
);
-- Using JOIN (often optimized better by the query planner)
SELECT e.*
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.location = 'NYC';
When to Prefer Subqueries Over JOINs
| Situation | Subquery | JOIN |
|---|---|---|
| Checking existence only | β EXISTS | Works |
| Avoiding duplicate rows | β Automatic | May duplicate |
| Readability for beginners | β Clear intent | Can be complex |
| Performance on large data | Depends | β Usually faster |
| Filtering on aggregated results | β Cleaner | Needs HAVING |
Practical Examples
Top N Per Group
-- Find the top 3 earners in each department
SELECT
first_name,
last_name,
department,
salary
FROM employees e1
WHERE 3 > (
SELECT COUNT(*)
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary > e1.salary
)
ORDER BY department, salary DESC;
Self-Referencing Subquery
-- Find employees who earn more than their manager
SELECT
e.first_name AS employee,
e.salary AS emp_salary,
m.first_name AS manager,
m.salary AS mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department
);
Summary
Key Takeaways
- Scalar subqueries return one value β use them in SELECT and WHERE for comparisons.
INtests membership;EXISTStests existence β prefer EXISTS with large or NULL-prone datasets.- Derived tables in FROM create temporary result sets that can be joined like regular tables.
- Subqueries in SELECT execute per row β use window functions for better performance on large data.
NOT INfails silently if the subquery returns any NULL values β filter or useNOT EXISTS.- Modern optimizers often convert subqueries to joins automatically β write for clarity first.