Interview Question: "Write a recursive CTE to find all ancestors of a node in a hierarchy. How do you prevent infinite loops? What's the difference between depth-first and breadth-first traversal in SQL?" — Asked at Netflix, Spotify, Uber for Staff Engineer roles
ℹ️
Difficulty: Advanced | Companies: Netflix, Spotify, Uber, Airbnb, Stripe | Time: 45-60 minutes
Recursive CTE Anatomy
A recursive CTE has three components:
-- Create organizational hierarchy
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
title VARCHAR(100),
hire_date DATE,
salary DECIMAL(12,2)
);
INSERT INTO employees VALUES
(1, 'CEO', NULL, 'Chief Executive Officer', '2020-01-01', 250000.00),
(2, 'CTO', 1, 'Chief Technology Officer', '2020-02-01', 220000.00),
(3, 'CFO', 1, 'Chief Financial Officer', '2020-03-01', 200000.00),
(4, 'VP Eng', 2, 'VP of Engineering', '2020-04-01', 180000.00),
(5, 'VP Product', 2, 'VP of Product', '2020-05-01', 175000.00),
(6, 'Director', 4, 'Director of Engineering', '2021-01-01', 150000.00),
(7, 'Manager 1', 6, 'Engineering Manager', '2021-06-01', 130000.00),
(8, 'Manager 2', 6, 'Engineering Manager', '2021-07-01', 125000.00),
(9, 'Sr Dev 1', 7, 'Senior Developer', '2022-01-01', 120000.00),
(10, 'Sr Dev 2', 7, 'Senior Developer', '2022-02-01', 115000.00),
(11, 'Dev 1', 8, 'Developer', '2022-03-01', 100000.00),
(12, 'Dev 2', 8, 'Developer', '2022-04-01', 95000.00),
(13, 'Intern 1', 9, 'Intern', '2023-06-01', 50000.00);
Depth-First Traversal with Path
-- Recursive CTE for complete hierarchy with path
WITH RECURSIVE emp_hierarchy AS (
-- Anchor: Top-level managers (CEO)
SELECT
emp_id,
name,
manager_id,
title,
hire_date,
salary,
1 AS depth,
name::TEXT AS path,
ARRAY[emp_id] AS path_ids,
name::TEXT AS search_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Find all subordinates
SELECT
e.emp_id,
e.name,
e.manager_id,
e.title,
e.hire_date,
e.salary,
h.depth + 1,
h.path || ' → ' || e.name,
h.path_ids || e.emp_id,
h.search_path || '/' || e.name
FROM employees e
INNER JOIN emp_hierarchy h ON e.manager_id = h.emp_id
WHERE h.depth < 10 -- Prevent infinite recursion
)
SELECT
emp_id,
name,
title,
depth,
path,
salary,
-- Calculate path length
array_length(path_ids, 1) - 1 AS hierarchy_level,
-- Check if leaf node
CASE
WHEN NOT EXISTS (
SELECT 1 FROM employees WHERE manager_id = emp_hierarchy.emp_id
) THEN 'LEAF'
ELSE 'BRANCH'
END AS node_type
FROM emp_hierarchy
ORDER BY path_ids;
Output:
| emp_id | name | title | depth | path | salary | hierarchy_level | node_type |
|---|---|---|---|---|---|---|---|
| 1 | CEO | Chief Executive Officer | 1 | CEO | 250000.00 | 0 | BRANCH |
| 2 | CTO | Chief Technology Officer | 2 | CEO → CTO | 220000.00 | 1 | BRANCH |
| 3 | CFO | Chief Financial Officer | 2 | CEO → CFO | 200000.00 | 1 | LEAF |
| 4 | VP Eng | VP of Engineering | 3 | CEO → CTO → VP Eng | 180000.00 | 2 | BRANCH |
| 5 | VP Product | VP of Product | 3 | CEO → CTO → VP Product | 175000.00 | 2 | LEAF |
| 6 | Director | Director of Engineering | 4 | CEO → CTO → VP Eng → Director | 150000.00 | 3 | BRANCH |
| 7 | Manager 1 | Engineering Manager | 5 | CEO → CTO → VP Eng → Director → Manager 1 | 130000.00 | 4 | BRANCH |
| 8 | Manager 2 | Engineering Manager | 5 | CEO → CTO → VP Eng → Director → Manager 2 | 125000.00 | 4 | BRANCH |
| 9 | Sr Dev 1 | Senior Developer | 6 | CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 1 | 120000.00 | 5 | BRANCH |
| 10 | Sr Dev 2 | Senior Developer | 6 | CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 2 | 115000.00 | 5 | LEAF |
| 11 | Dev 1 | Developer | 6 | CEO → CTO → VP Eng → Director → Manager 2 → Dev 1 | 100000.00 | 5 | LEAF |
| 12 | Dev 2 | Developer | 6 | CEO → CTO → VP Eng → Director → Manager 2 → Dev 2 | 95000.00 | 5 | LEAF |
| 13 | Intern 1 | Intern | 7 | CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 1 → Intern 1 | 50000.00 | 6 | LEAF |
⚠️
Infinite Recursion Prevention: Always include a depth limit or cycle detection. Without it, circular references cause infinite loops.
Breadth-First Search Pattern
-- BFS traversal with level-by-level processing
WITH RECURSIVE bfs_traversal AS (
SELECT
emp_id,
name,
manager_id,
1 AS level,
ARRAY[emp_id] AS visited,
name::TEXT AS level_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT
e.emp_id,
e.name,
e.manager_id,
b.level + 1,
b.visited || e.emp_id,
b.level_path || ' → ' || e.name
FROM employees e
INNER JOIN bfs_traversal b ON e.manager_id = b.emp_id
WHERE e.emp_id != ALL(b.visited) -- Cycle detection
)
SELECT
level,
COUNT(*) AS nodes_at_level,
STRING_AGG(name, ', ' ORDER BY name) AS members,
MAX(salary) AS max_salary,
ROUND(AVG(salary), 2) AS avg_salary
FROM bfs_traversal
JOIN employees USING (emp_id)
GROUP BY level
ORDER BY level;
Output:
| level | nodes_at_level | members | max_salary | avg_salary |
|---|---|---|---|---|
| 1 | 1 | CEO | 250000.00 | 250000.00 |
| 2 | 2 | CFO, CTO | 220000.00 | 210000.00 |
| 3 | 2 | VP Eng, VP Product | 180000.00 | 177500.00 |
| 4 | 1 | Director | 150000.00 | 150000.00 |
| 5 | 2 | Manager 1, Manager 2 | 130000.00 | 127500.00 |
| 6 | 4 | Dev 1, Dev 2, Sr Dev 1, Sr Dev 2 | 120000.00 | 107500.00 |
| 7 | 1 | Intern 1 | 50000.00 | 50000.00 |
Graph Traversal: Shortest Path
-- Create graph table for network connections
CREATE TABLE network_connections (
source_node INT,
target_node INT,
connection_cost DECIMAL(10,2),
connection_type VARCHAR(50)
);
INSERT INTO network_connections VALUES
(1, 2, 10.00, 'direct'),
(1, 3, 15.00, 'direct'),
(2, 3, 5.00, 'indirect'),
(2, 4, 20.00, 'direct'),
(3, 4, 8.00, 'direct'),
(3, 5, 12.00, 'direct'),
(4, 5, 3.00, 'direct'),
(4, 6, 18.00, 'direct'),
(5, 6, 7.00, 'direct');
-- Find shortest path using recursive CTE
WITH RECURSIVE shortest_path AS (
-- Start from source node 1
SELECT
source_node,
target_node,
connection_cost AS total_cost,
1 AS hops,
ARRAY[source_node, target_node] AS path,
source_node || ' → ' || target_node::TEXT AS path_text
FROM network_connections
WHERE source_node = 1
UNION ALL
-- Extend path by one hop
SELECT
sp.source_node,
nc.target_node,
sp.total_cost + nc.connection_cost,
sp.hops + 1,
sp.path || nc.target_node,
sp.path_text || ' → ' || nc.target_node
FROM shortest_path sp
INNER JOIN network_connections nc
ON sp.target_node = nc.source_node
WHERE nc.target_node != ALL(sp.path) -- No cycles
AND sp.hops < 10 -- Max depth
)
SELECT DISTINCT ON (source_node, target_node)
source_node,
target_node,
total_cost,
hops,
path,
path_text
FROM shortest_path
WHERE target_node = 6 -- Destination
ORDER BY source_node, target_node, total_cost;
Output:
| source_node | target_node | total_cost | hops | path | path_text |
|---|---|---|---|---|---|
| 1 | 6 | 30.00 | 3 | {1,2,4,6} | 1 → 2 → 4 → 6 |
| 1 | 6 | 31.00 | 3 | {1,3,4,6} | 1 → 3 → 4 → 6 |
| 1 | 6 | 34.00 | 4 | {1,2,3,4,6} | 1 → 2 → 3 → 4 → 6 |
| 1 | 6 | 37.00 | 4 | {1,2,4,5,6} | 1 → 2 → 4 → 5 → 6 |
Scheduling with Recursive CTE
-- Create project tasks with dependencies
CREATE TABLE project_tasks (
task_id INT PRIMARY KEY,
task_name VARCHAR(100),
duration_days INT,
depends_on INT,
priority INT
);
INSERT INTO project_tasks VALUES
(1, 'Requirements', 5, NULL, 1),
(2, 'Design', 10, 1, 2),
(3, 'Database Schema', 7, 2, 3),
(4, 'API Development', 15, 2, 3),
(5, 'Frontend Dev', 12, 2, 3),
(6, 'Integration', 8, 3, 4),
(7, 'Testing', 10, 4, 4),
(8, 'Deployment', 3, 5, 5),
(9, 'Documentation', 5, 4, 5),
(10, 'Go Live', 1, 6, 6);
-- Calculate critical path using recursive CTE
WITH RECURSIVE task_schedule AS (
-- Anchor: Tasks with no dependencies
SELECT
task_id,
task_name,
duration_days,
depends_on,
priority,
0 AS start_day,
duration_days AS end_day,
ARRAY[task_id] AS dependency_chain,
task_id::TEXT AS chain_text,
0 AS max_chain_length
FROM project_tasks
WHERE depends_on IS NULL
UNION ALL
-- Recursive: Calculate start/end based on dependencies
SELECT
t.task_id,
t.task_name,
t.duration_days,
t.depends_on,
t.priority,
ts.end_day AS start_day,
ts.end_day + t.duration_days AS end_day,
ts.dependency_chain || t.task_id,
ts.chain_text || ' → ' || t.task_name,
array_length(ts.dependency_chain, 1)
FROM project_tasks t
INNER JOIN task_schedule ts ON t.depends_on = ts.task_id
WHERE t.task_id != ALL(ts.dependency_chain) -- No cycles
)
SELECT
task_id,
task_name,
duration_days,
start_day,
end_day,
end_day - start_day AS actual_duration,
dependency_chain,
-- Calculate total project duration
MAX(end_day) OVER () AS total_project_days,
-- Identify critical path tasks
CASE
WHEN end_day = MAX(end_day) OVER () THEN 'CRITICAL'
WHEN end_day >= MAX(end_day) OVER () - 5 THEN 'NEAR CRITICAL'
ELSE 'NORMAL'
END AS status
FROM task_schedule
ORDER BY start_day, task_id;
Output:
| task_id | task_name | duration_days | start_day | end_day | actual_duration | dependency_chain | total_project_days | status |
|---|---|---|---|---|---|---|---|---|
| 1 | Requirements | 5 | 0 | 5 | 5 | {1} | 53 | CRITICAL |
| 2 | Design | 10 | 5 | 15 | 10 | {1,2} | 53 | CRITICAL |
| 3 | Database Schema | 7 | 15 | 22 | 7 | {1,2,3} | 53 | CRITICAL |
| 4 | API Development | 15 | 15 | 30 | 15 | {1,2,4} | 53 | CRITICAL |
| 5 | Frontend Dev | 12 | 15 | 27 | 12 | {1,2,5} | 53 | NORMAL |
| 6 | Integration | 8 | 30 | 38 | 8 | {1,2,4,6} | 53 | CRITICAL |
| 7 | Testing | 10 | 30 | 40 | 10 | {1,2,4,7} | 53 | NORMAL |
| 8 | Deployment | 3 | 27 | 30 | 3 | {1,2,5,8} | 53 | NORMAL |
| 9 | Documentation | 5 | 30 | 35 | 5 | {1,2,4,9} | 53 | NORMAL |
| 10 | Go Live | 1 | 38 | 39 | 1 | {1,2,4,6,10} | 53 | CRITICAL |
ℹ️
Critical Path: The longest path through the project network determines minimum project duration. Tasks on this path have zero slack.
Cycle Detection Pattern
-- Detect cycles in graph
WITH RECURSIVE cycle_detection AS (
SELECT
source_node,
target_node,
1 AS depth,
ARRAY[source_node, target_node] AS path,
FALSE AS has_cycle
FROM network_connections
UNION ALL
SELECT
cd.source_node,
nc.target_node,
cd.depth + 1,
cd.path || nc.target_node,
nc.target_node = ANY(cd.path) -- Cycle detected!
FROM cycle_detection cd
INNER JOIN network_connections nc ON cd.target_node = nc.source_node
WHERE NOT cd.has_cycle
AND cd.depth < 100 -- Safety limit
)
SELECT
source_node,
target_node,
depth,
path,
has_cycle
FROM cycle_detection
WHERE has_cycle = TRUE;
Mathematical Formulation
For a directed graph , the shortest path problem seeks:
Where is the set of all paths from to and is the edge weight.
The recursive CTE implements a modified Dijkstra's algorithm:
Time complexity: in worst case with recursive CTE.
⚠️
Performance Warning: Recursive CTEs can be expensive for deep hierarchies. Consider materializing results or using iterative approaches for production systems.