π CTEs & Recursive Queries
Amazon & Microsoft Interview Deep Dive
π Interview Question
βΉοΈπ΄ Amazon/Microsoft Interview Question
"Given an employees table with employee_id, name, and manager_id, write a query to find the management hierarchy β the full reporting chain from each employee up to the CEO. Also find all subordinates of a specific manager and calculate the total number of direct and indirect reports for each manager."
Companies: Amazon, Microsoft | Difficulty: Hard | Time: 50 minutes
π Setup: Organizational Data
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,
manager_id INT REFERENCES employees(employee_id),
department VARCHAR(50),
position VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
);
-- Insert hierarchical data (CEO at top)
INSERT INTO employees (employee_id, employee_name, manager_id, department, position, salary, hire_date) VALUES
(1, 'Sarah Chen (CEO)', NULL, 'Executive', 'CEO', 300000, '2015-01-01'),
(2, 'Michael Park', 1, 'Engineering', 'VP Engineering', 250000, '2016-03-15'),
(3, 'Jennifer Wu', 1, 'Marketing', 'VP Marketing', 220000, '2016-06-01'),
(4, 'David Kim', 1, 'Sales', 'VP Sales', 230000, '2017-01-10'),
(5, 'Emily Rodriguez', 2, 'Engineering', 'Director', 180000, '2017-04-20'),
(6, 'James Thompson', 2, 'Engineering', 'Director', 175000, '2017-07-15'),
(7, 'Lisa Wang', 3, 'Marketing', 'Director', 165000, '2018-01-05'),
(8, 'Robert Chen', 4, 'Sales', 'Director', 170000, '2018-03-10'),
(9, 'Amanda Foster', 5, 'Engineering', 'Senior Engineer', 140000, '2018-06-01'),
(10, 'Chris Martinez', 5, 'Engineering', 'Senior Engineer', 138000, '2018-08-15'),
(11, 'Nicole Adams', 6, 'Engineering', 'Engineer', 120000, '2019-01-10'),
(12, 'Kevin Lee', 6, 'Engineering', 'Engineer', 118000, '2019-04-20'),
(13, 'Rachel Green', 7, 'Marketing', 'Specialist', 95000, '2019-06-01'),
(14, 'Tom Wilson', 8, 'Sales', 'Account Executive', 85000, '2019-09-10'),
(15, 'Maria Garcia', 9, 'Engineering', 'Junior Engineer', 95000, '2020-01-15');
π Part 1: Basic CTEs (Non-Recursive)
Why CTEs Over Subqueries?
π‘π‘ CTE vs Subquery
CTEs provide:
- Readability: Named, reusable result sets
- Modularity: Break complex queries into logical steps
- Recursion: Enable recursive hierarchical queries
- Reference: Multiple references to same CTE without re-execution (in most databases)
-- CTE: Calculate department statistics
WITH department_stats AS (
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_salary
FROM employees
WHERE manager_id IS NOT NULL -- Exclude CEO
GROUP BY department
),
salary_bands AS (
SELECT
employee_id,
employee_name,
department,
salary,
CASE
WHEN salary >= 200000 THEN 'Executive'
WHEN salary >= 150000 THEN 'Senior'
WHEN salary >= 100000 THEN 'Mid-Level'
ELSE 'Junior'
END AS salary_band
FROM employees
)
SELECT
sb.employee_name,
sb.department,
sb.salary,
sb.salary_band,
ds.employee_count AS dept_size,
ROUND(sb.salary * 100.0 / ds.total_salary, 2) AS pct_of_dept_total
FROM salary_bands sb
JOIN department_stats ds ON sb.department = ds.department
ORDER BY sb.department, sb.salary DESC;
Multiple CTEs for Complex Analysis
WITH
-- Step 1: Find direct reports count
direct_reports AS (
SELECT
manager_id,
COUNT(*) AS direct_report_count
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id
),
-- Step 2: Find total team size (direct + indirect)
team_sizes AS (
SELECT
manager_id,
SUM(direct_report_count) AS total_reports
FROM direct_reports
GROUP BY manager_id
),
-- Step 3: Calculate management metrics
manager_metrics AS (
SELECT
e.employee_id,
e.employee_name,
e.department,
e.salary,
COALESCE(dr.direct_report_count, 0) AS direct_reports,
COALESCE(ts.total_reports, 0) AS total_reports,
e.salary / NULLIF(COALESCE(ts.total_reports, 0), 0) AS cost_per_report
FROM employees e
LEFT JOIN direct_reports dr ON e.employee_id = dr.manager_id
LEFT JOIN team_sizes ts ON e.employee_id = ts.manager_id
)
SELECT
employee_name,
department,
salary,
direct_reports,
total_reports,
ROUND(COALESCE(cost_per_report, 0), 2) AS cost_per_report,
CASE
WHEN total_reports > 10 THEN 'Large Team'
WHEN total_reports > 5 THEN 'Medium Team'
WHEN total_reports > 0 THEN 'Small Team'
ELSE 'Individual Contributor'
END AS team_size_category
FROM manager_metrics
ORDER BY total_reports DESC;
π³ Part 2: Recursive CTEs
βΉοΈπ Recursive CTE Structure
Recursive CTEs have two parts:
- Anchor member: Initial query (base case)
- Recursive member: References the CTE itself (recursive step)
- Termination: Implicit (when no more rows) or explicit (LIMIT/depth check)
Basic Recursive Hierarchy
-- Find complete management chain for each employee
WITH RECURSIVE management_chain AS (
-- Anchor: Start with employees (non-CEOs)
SELECT
employee_id,
employee_name,
manager_id,
department,
1 AS level,
employee_name::TEXT AS chain,
ARRAY[employee_id] AS path
FROM employees
WHERE manager_id IS NOT NULL
UNION ALL
-- Recursive: Join back to find managers
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
e.department,
mc.level + 1,
mc.chain || ' β ' || e.employee_name,
mc.path || e.employee_id
FROM employees e
INNER JOIN management_chain mc ON e.employee_id = mc.manager_id
WHERE mc.level < 20 -- Safety: prevent infinite loops
)
SELECT
employee_id,
employee_name,
department,
level AS hierarchy_depth,
chain AS reporting_chain,
path AS employee_path
FROM management_chain
WHERE manager_id IS NULL -- Only full chains to CEO
ORDER BY employee_id;
Finding All Subordinates of a Manager
-- Find all subordinates (direct and indirect) of Employee ID 2
WITH RECURSIVE subordinates AS (
-- Anchor: Direct reports
SELECT
employee_id,
employee_name,
manager_id,
department,
position,
1 AS depth
FROM employees
WHERE manager_id = 2 -- Michael Park's direct reports
UNION ALL
-- Recursive: Reports of reports
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
e.department,
e.position,
s.depth + 1
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT
employee_id,
employee_name,
department,
position,
depth AS reporting_depth
FROM subordinates
ORDER BY depth, employee_name;
Counting Total Reports Per Manager
-- Calculate total number of direct and indirect reports for each manager
WITH RECURSIVE all_reports AS (
-- Anchor: All employees with managers
SELECT
employee_id,
manager_id,
employee_id AS original_manager,
1 AS distance
FROM employees
WHERE manager_id IS NOT NULL
UNION ALL
-- Recursive: Find who reports to the manager's reports
SELECT
ar.employee_id,
e.manager_id,
ar.original_manager,
ar.distance + 1
FROM all_reports ar
INNER JOIN employees e ON ar.manager_id = e.employee_id
WHERE ar.distance < 20
),
report_counts AS (
SELECT
original_manager AS manager_id,
COUNT(DISTINCT employee_id) AS total_indirect_reports
FROM all_reports
GROUP BY original_manager
)
SELECT
e.employee_id,
e.employee_name,
e.department,
e.position,
COALESCE(rc.total_indirect_reports, 0) AS indirect_reports,
(SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) AS direct_reports,
COALESCE(rc.total_indirect_reports, 0) +
(SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) AS total_reports
FROM employees e
LEFT JOIN report_counts rc ON e.employee_id = rc.manager_id
WHERE (SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) > 0
ORDER BY total_reports DESC;
πΊοΈ Part 3: Graph Traversal with Recursive CTEs
β οΈβ οΈ Graph Cycle Detection
When traversing graphs (not trees), you must detect cycles to prevent infinite recursion. Use an array or set to track visited nodes.
Shortest Path in a Graph
-- Create a social network graph
CREATE TABLE friendships (
user_id INT,
friend_id INT,
since_date DATE
);
INSERT INTO friendships VALUES
(1, 2, '2020-01-01'), (1, 3, '2020-02-15'),
(2, 4, '2020-03-01'), (3, 5, '2020-04-10'),
(4, 6, '2020-05-20'), (5, 6, '2020-06-01'),
(6, 7, '2020-07-15'), (7, 8, '2020-08-01'),
(8, 1, '2020-09-10'); -- Creates a cycle!
-- Find shortest path from user 1 to user 8
WITH RECURSIVE shortest_path AS (
-- Anchor: Direct connections
SELECT
user_id,
friend_id,
ARRAY[user_id, friend_id] AS path,
1 AS distance
FROM friendships
WHERE user_id = 1
UNION
-- Recursive: Explore friends of friends
SELECT
sp.user_id,
f.friend_id,
sp.path || f.friend_id,
sp.distance + 1
FROM shortest_path sp
INNER JOIN friendships f ON sp.friend_id = f.user_id
WHERE f.friend_id != ALL(sp.path) -- Cycle detection
AND sp.distance < 10 -- Max depth
)
SELECT
user_id AS start_user,
friend_id AS end_user,
path AS shortest_path,
distance
FROM shortest_path
WHERE friend_id = 8
ORDER BY distance
LIMIT 1;
π Part 4: Hierarchical Aggregation
Organizational Cost Analysis
-- Calculate total cost of each management subtree
WITH RECURSIVE org_tree AS (
-- Anchor: Individual employees
SELECT
employee_id,
employee_name,
manager_id,
department,
salary,
employee_id AS subtree_root,
salary AS subtree_cost,
1 AS subtree_size,
employee_name::TEXT AS subtree_members
FROM employees
UNION ALL
-- Recursive: Aggregate upward
SELECT
ot.employee_id,
ot.employee_name,
ot.manager_id,
ot.department,
ot.salary,
ot.subtree_root,
ot.subtree_cost + e.salary,
ot.subtree_size + 1,
ot.subtree_members || ', ' || e.employee_name
FROM org_tree ot
INNER JOIN employees e ON ot.manager_id = e.employee_id
WHERE ot.subtree_size < 100 -- Safety limit
)
SELECT DISTINCT
employee_id,
employee_name,
department,
subtree_size - 1 AS total_reports,
subtree_cost - salary AS total_team_cost,
ROUND(subtree_cost * 100.0 / (SELECT SUM(salary) FROM employees), 2) AS pct_of_total_cost
FROM org_tree
WHERE subtree_size > 1
ORDER BY total_team_cost DESC;
π Part 5: Multiple Recursive Members
π‘π‘ Multiple Anchor Members
You can have multiple anchor members united by UNION ALL. This is useful for finding paths from multiple starting points or handling different node types.
-- Find all paths between two specific employees (bidirectional)
WITH RECURSIVE all_paths AS (
-- Forward direction
SELECT
user_id AS start_node,
friend_id AS end_node,
ARRAY[user_id, friend_id] AS path,
1 AS path_length
FROM friendships
WHERE user_id = 1
UNION
-- Backward direction
SELECT
friend_id AS start_node,
user_id AS end_node,
ARRAY[friend_id, user_id] AS path,
1 AS path_length
FROM friendships
WHERE friend_id = 1
UNION ALL
-- Extend paths
SELECT
ap.start_node,
CASE
WHEN f.user_id = ap.end_node THEN f.friend_id
ELSE f.user_id
END,
ap.path || CASE
WHEN f.user_id = ap.end_node THEN f.friend_id
ELSE f.user_id
END,
ap.path_length + 1
FROM all_paths ap
INNER JOIN friendships f ON f.user_id = ap.end_node OR f.friend_id = ap.end_node
WHERE NOT (
CASE
WHEN f.user_id = ap.end_node THEN f.friend_id
ELSE f.user_id
END = ANY(ap.path)
)
AND ap.path_length < 6
)
SELECT
start_node,
end_node,
path,
path_length
FROM all_paths
WHERE end_node = 8
ORDER BY path_length
LIMIT 5;
ποΈ Part 6: Advanced Patterns
Materialized Path Pattern
-- Store materialized path for fast ancestor/descendant queries
ALTER TABLE employees ADD COLUMN materialized_path TEXT;
-- Populate materialized path using recursive CTE
WITH RECURSIVE path_builder AS (
SELECT
employee_id,
employee_name,
manager_id,
'/' || employee_id || '/' AS materialized_path,
1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
pb.materialized_path || e.employee_id || '/',
pb.depth + 1
FROM employees e
INNER JOIN path_builder pb ON e.manager_id = pb.employee_id
)
UPDATE employees
SET materialized_path = pb.materialized_path,
depth = pb.depth
FROM path_builder pb
WHERE employees.employee_id = pb.employee_id;
-- Fast ancestor query using materialized path
SELECT
e.employee_id,
e.employee_name,
e.materialized_path
FROM employees e
WHERE '/1/2/5/' LIKE e.materialized_path || '%'
ORDER BY e.materialized_path;
-- Fast descendant query
SELECT
e.employee_id,
e.employee_name,
e.materialized_path
FROM employees e
WHERE e.materialized_path LIKE '/1/2/%'
AND e.employee_id != 2
ORDER BY e.materialized_path;
Recursive Date Series Generation
-- Generate a date series for daily reporting
WITH RECURSIVE date_series AS (
SELECT
DATE '2024-01-01' AS report_date,
1 AS day_number
UNION ALL
SELECT
report_date + INTERVAL '1 day',
day_number + 1
FROM date_series
WHERE report_date < DATE '2024-12-31'
)
SELECT
report_date,
day_number,
EXTRACT(DOW FROM report_date) AS day_of_week,
CASE EXTRACT(DOW FROM report_date)
WHEN 0 THEN 'Sunday'
WHEN 6 THEN 'Saturday'
ELSE 'Weekday'
END AS day_type
FROM date_series
WHERE EXTRACT(DOW FROM report_date) NOT IN (0, 6) -- Weekdays only
ORDER BY report_date;
β±οΈ Complexity Analysis
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Simple CTE | O(n) | O(n) | Result set materialization |
| Recursive CTE (Tree) | O(n) | O(depth) | Stack depth = tree height |
| Recursive CTE (Graph) | O(V + E) | O(V) | Vertices + Edges |
| Path Finding | O(V!) worst | O(V) | Exponential without optimization |
| Materialized Path | O(n) query | O(path_length) | Fast lookups after O(n) build |
π― Quiz Section
π Best Practices for Interviews
π‘β CTE & Recursion Best Practices
1. Always Set Recursion Limits:
-- BAD: Potential infinite loop
WITH RECURSIVE cte AS (
SELECT * FROM graph
UNION ALL
SELECT g.* FROM graph g JOIN cte ON ...
)
-- GOOD: Safety limit
WITH RECURSIVE cte AS (
SELECT *, 1 AS depth FROM graph
UNION ALL
SELECT g.*, c.depth + 1
FROM graph g JOIN cte ON ...
WHERE c.depth < 100
)
2. Use UNION for Cycle Detection:
-- For graphs, use UNION to automatically deduplicate visited nodes
-- For trees, UNION ALL is safe and faster
-- Manual cycle detection with path tracking:
WHERE NOT (node_id = ANY(visited_path))
3. Break Complex Problems into CTEs:
-- Step-by-step approach
WITH
step1 AS (...), -- Get raw data
step2 AS (...), -- Filter/transform
step3 AS (...), -- Aggregate
SELECT * FROM step3;
4. Consider Alternative Patterns:
- Adjacency List: Simple but slow for deep hierarchies
- Materialized Path: Fast reads, complex updates
- Nested Sets: Fast ancestors/descendants, expensive inserts
- Closure Table: Balanced read/write performance
π Common Follow-Up Questions
- "How would you handle circular references?" β Track visited nodes in an array
- "What's the maximum recursion depth?" β Database-dependent (PostgreSQL: 300, SQL Server: 100)
- "Can recursive CTEs be used in views?" β Yes, but performance varies by database
- "How do you optimize recursive CTEs?" β Add indexes on join columns, limit depth
β οΈβ οΈ Performance Warning
Recursive CTEs can be expensive for deep hierarchies. Consider materializing paths or using adjacency list with indexes for frequently queried hierarchies. Always explain the trade-offs to interviewers.