Advanced SQL

Subqueries

Nest one query inside another to perform complex data retrieval with scalar, row, table, and correlated subqueries.

Scalar Subqueries — Return a single value for use in comparisons
Table Subqueries — Generate derived tables for filtering and joining Think of subqueries as asking a question about the answer to another question.

What Is a Subquery?

DfSubquery

A subquery is a SELECT statement embedded inside another SQL statement, enclosed in parentheses. The outer query uses the subquery's result as a value, set of values, or temporary table.

-- Basic subquery structure
SELECT column1, column2
FROM table1
WHERE column3 IN (
    SELECT column FROM table2 WHERE condition
);

Subqueries can appear in four locations:

WHERE — Filter rows based on subquery results
FROM — Create a derived table (subquery factoring)
SELECT — Return a scalar value per row
HAVING — Filter groups based on aggregated subquery results

Scalar Subqueries

DfScalar Subquery

Returns exactly one row and one column. Used anywhere a single value is expected — in SELECT, WHERE, or HAVING clauses.

-- Find employees who earn more than the company average
SELECT
    first_name,
    last_name,
    salary
FROM employees
WHERE salary > (
    SELECT AVG(salary) FROM employees
);

-- Show each employee's salary alongside the company average
SELECT
    first_name,
    last_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

Scalar subqueries must return exactly one row. If the subquery returns multiple rows, the database throws an error. Use LIMIT 1 or aggregation to guarantee a single result.

Row Subqueries

DfRow Subquery

Returns a single row with multiple columns. Used with comparisons like =, >, or < to match against multiple fields simultaneously.

-- Find employees who share the same department and salary as employee ID 101
SELECT
    first_name,
    last_name,
    department,
    salary
FROM employees
WHERE (department, salary) = (
    SELECT department, salary
    FROM employees
    WHERE employee_id = 101
);

IN Subqueries

DfIN Subquery

Tests whether a value exists in the result set returned by a subquery. Returns TRUE if the value matches any row in the subquery.

-- Find all orders from customers in New York
SELECT
    order_id,
    order_date,
    amount
FROM orders
WHERE customer_id IN (
    SELECT customer_id
    FROM customers
    WHERE city = 'New York'
);

-- Find products that have never been ordered
SELECT
    product_name,
    category
FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id
    FROM order_items
    WHERE product_id IS NOT NULL
);

When using NOT IN, ensure the subquery contains no NULL values. A single NULL in the result set causes NOT IN to return zero rows. Filter NULLs explicitly or use NOT EXISTS instead.

EXISTS Subqueries

DfEXISTS

Tests whether a subquery returns any rows. Returns TRUE if at least one row exists. Does not return data — only checks for existence.

-- Find customers who have placed at least one order
SELECT
    customer_id,
    customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Find customers with no orders using NOT EXISTS
SELECT
    customer_id,
    customer_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

IN vs EXISTS Comparison

Scenario	Use IN	Use EXISTS
Subquery returns small list	✅ Best	Works
Subquery returns large dataset	Works	✅ Best
Need to check for NULLs	⚠️ Risky	✅ Safe
Outer table is small	✅ Best	Works
Subquery is complex/JOINed	Works	✅ Best

Derived Table Subqueries

DfDerived Table

A subquery in the FROM clause that produces a temporary result set. Must be aliased and can be joined like any regular table.

-- Use a derived table to compute department statistics first
SELECT
    d.dept_name,
    d_stats.avg_salary,
    d_stats.headcount
FROM departments d
JOIN (
    SELECT
        department_id,
        AVG(salary) AS avg_salary,
        COUNT(*) AS headcount
    FROM employees
    GROUP BY department_id
) d_stats ON d.dept_id = d_stats.department_id;

-- Filter on aggregated results using a derived table
SELECT *
FROM (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
) customer_totals
WHERE total_spent > 1000
ORDER BY total_spent DESC;

Subqueries in SELECT

-- Correlated scalar subquery in SELECT clause
SELECT
    e.first_name,
    e.last_name,
    e.salary,
    e.department,
    e.salary - (
        SELECT AVG(salary)
        FROM employees e2
        WHERE e2.department = e.department
    ) AS diff_from_dept_avg
FROM employees e;

-- Count related rows with a scalar subquery
SELECT
    c.customer_id,
    c.customer_name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
    (SELECT SUM(amount) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c;

Subqueries in SELECT execute once per row. For large datasets, consider using window functions or JOINs instead for better performance.

Subqueries in HAVING

-- Find departments with above-average headcount
SELECT
    department,
    COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > (
    SELECT AVG(dept_count)
    FROM (
        SELECT COUNT(*) AS dept_count
        FROM employees
        GROUP BY department
    ) dept_sizes
);

-- Find product categories where average order value exceeds the overall average
SELECT
    category,
    ROUND(AVG(amount), 2) AS avg_order_value
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY category
HAVING AVG(amount) > (
    SELECT AVG(amount) FROM order_items
);

Subquery vs JOIN Performance

-- Using subquery (may be slower on large tables)
SELECT *
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'NYC'
);

-- Using JOIN (often optimized better by the query planner)
SELECT e.*
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.location = 'NYC';

When to Prefer Subqueries Over JOINs

Situation	Subquery	JOIN
Checking existence only	✅ EXISTS	Works
Avoiding duplicate rows	✅ Automatic	May duplicate
Readability for beginners	✅ Clear intent	Can be complex
Performance on large data	Depends	✅ Usually faster
Filtering on aggregated results	✅ Cleaner	Needs HAVING

Most modern query optimizers automatically rewrite IN subqueries as semi-joins. Write the version that is clearest; let the optimizer handle performance.

Practical Examples

Top N Per Group

-- Find the top 3 earners in each department
SELECT
    first_name,
    last_name,
    department,
    salary
FROM employees e1
WHERE 3 > (
    SELECT COUNT(*)
    FROM employees e2
    WHERE e2.department = e1.department
      AND e2.salary > e1.salary
)
ORDER BY department, salary DESC;

Self-Referencing Subquery

-- Find employees who earn more than their manager
SELECT
    e.first_name AS employee,
    e.salary AS emp_salary,
    m.first_name AS manager,
    m.salary AS mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department
);

Summary

Key Takeaways

Scalar subqueries return one value — use them in SELECT and WHERE for comparisons.
IN tests membership; EXISTS tests existence — prefer EXISTS with large or NULL-prone datasets.
Derived tables in FROM create temporary result sets that can be joined like regular tables.
Subqueries in SELECT execute per row — use window functions for better performance on large data.
NOT IN fails silently if the subquery returns any NULL values — filter or use NOT EXISTS.
Modern optimizers often convert subqueries to joins automatically — write for clarity first.

Subqueries

Subqueries

What Is a Subquery?

DfSubquery

Scalar Subqueries

DfScalar Subquery

Row Subqueries

DfRow Subquery

IN Subqueries

DfIN Subquery

EXISTS Subqueries

DfEXISTS

IN vs EXISTS Comparison

Derived Table Subqueries

DfDerived Table

Subqueries in SELECT

Subqueries in HAVING

Subquery vs JOIN Performance

When to Prefer Subqueries Over JOINs

Practical Examples

Top N Per Group

Self-Referencing Subquery

Summary

Key Takeaways

Premium Content

Need Expert SQL Help?