π― SQL Window Functions
Google & Meta Interview Deep Dive
π Interview Question
βΉοΈπ΄ Google/Meta Interview Question
"Given a table of employee salaries with department information, write a query to rank employees within each department by salary. Then find the top 3 earners per department. Additionally, calculate each employee's salary as a percentage of their department's total and the difference from the previous and next employee's salary using LAG and LEAD."
Companies: Google, Meta | Difficulty: Hard | Time: 45 minutes
π Setup: Sample Data
Let's create a realistic dataset to work with:
-- Create tables
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
);
-- Insert sample data
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Alice Johnson', 'Engineering', 120000.00, '2020-01-15'),
('Bob Smith', 'Engineering', 95000.00, '2019-06-20'),
('Charlie Brown', 'Engineering', 135000.00, '2018-03-10'),
('Diana Ross', 'Engineering', 88000.00, '2021-09-01'),
('Eve Davis', 'Engineering', 110000.00, '2020-07-14'),
('Frank Wilson', 'Marketing', 78000.00, '2019-11-30'),
('Grace Lee', 'Marketing', 92000.00, '2020-04-22'),
('Hank Miller', 'Marketing', 65000.00, '2021-01-15'),
('Ivy Chen', 'Marketing', 105000.00, '2018-08-05'),
('Jack Taylor', 'Marketing', 72000.00, '2020-12-01'),
('Karen White', 'Data Science', 130000.00, '2019-02-14'),
('Leo Martin', 'Data Science', 115000.00, '2020-05-30'),
('Mia Garcia', 'Data Science', 140000.00, '2018-11-20'),
('Noah Clark', 'Data Science', 98000.00, '2021-03-10'),
('Olivia Lewis', 'Data Science', 108000.00, '2020-08-25');
π’ Part 1: ROW_NUMBER, RANK, and DENSE_RANK
The Core Problem
β οΈβ οΈ Common Interview Pitfall
Interviewers often ask: "What's the difference between RANK and DENSE_RANK?" The key is handling ties:
ROW_NUMBER()β No ties, always unique sequential numbersRANK()β Ties get same number, gaps in sequence (1, 1, 3)DENSE_RANK()β Ties get same number, no gaps (1, 1, 2)
-- ROW_NUMBER: Assigns unique sequential numbers
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS row_num,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rank_num,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dense_rank_num
FROM employees
ORDER BY department, salary DESC;
Output:
| employee_name | department | salary | row_num | rank_num | dense_rank_num |
|---|---|---|---|---|---|
| Charlie Brown | Engineering | 135000 | 1 | 1 | 1 |
| Alice Johnson | Engineering | 120000 | 2 | 2 | 2 |
| Eve Davis | Engineering | 110000 | 3 | 3 | 3 |
| Bob Smith | Engineering | 95000 | 4 | 4 | 4 |
| Diana Ross | Engineering | 88000 | 5 | 5 | 5 |
| Ivy Chen | Marketing | 105000 | 1 | 1 | 1 |
| Grace Lee | Marketing | 92000 | 2 | 2 | 2 |
| Frank Wilson | Marketing | 78000 | 3 | 3 | 3 |
| Jack Taylor | Marketing | 72000 | 4 | 4 | 4 |
| Hank Miller | Marketing | 65000 | 5 | 5 | 5 |
| Mia Garcia | Data Science | 140000 | 1 | 1 | 1 |
| Karen White | Data Science | 130000 | 2 | 2 | 2 |
| Leo Martin | Data Science | 115000 | 3 | 3 | 3 |
| Olivia Lewis | Data Science | 108000 | 4 | 4 | 4 |
| Noah Clark | Data Science | 98000 | 5 | 5 | 5 |
Handling Ties: When Rankings Are Equal
-- Adding duplicate salaries to demonstrate tie handling
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Zara Adams', 'Engineering', 120000.00, '2021-05-10'), -- Same as Alice
('Yuki Tanaka', 'Engineering', 120000.00, '2020-03-15'); -- Same as Alice
-- See the difference in rankings
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (
PARTITION BY department ORDER BY salary DESC
) AS row_number,
RANK() OVER (
PARTITION BY department ORDER BY salary DESC
) AS rank_val,
DENSE_RANK() OVER (
PARTITION BY department ORDER BY salary DESC
) AS dense_rank_val
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;
Key insight for interviews: ROW_NUMBER() will arbitrarily assign different numbers to ties (deterministic only if you add a secondary ORDER BY column), while RANK() creates gaps and DENSE_RANK() does not.
π Part 2: Finding Top N Per Group
π‘π‘ Pro Tip for Interviews
The most efficient way to find top N per group uses a subquery with DENSE_RANK or ROW_NUMBER in the WHERE clause. Always consider edge cases: what if there are fewer than N rows in a group?
-- Top 3 highest earners per department
WITH ranked_employees AS (
SELECT
employee_id,
employee_name,
department,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
)
SELECT
employee_id,
employee_name,
department,
salary,
salary_rank
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank;
Alternative: Using ROW_NUMBER for strictly unique results:
-- If you want exactly 3 rows per department (even with ties)
SELECT
employee_id,
employee_name,
department,
salary
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC, employee_id -- Break ties deterministically
) AS rn
FROM employees
) sub
WHERE rn <= 3
ORDER BY department, rn;
π Part 3: LAG and LEAD Functions
βΉοΈπ LAG and LEAD Explained
LAG(column, offset, default)β Access previous row's valueLEAD(column, offset, default)β Access next row's value- Third parameter provides a default when no previous/next row exists
-- Calculate salary difference from previous and next employee
-- Also compute running department total percentage
SELECT
employee_name,
department,
salary,
LAG(employee_name, 1, 'N/A') OVER (
PARTITION BY department ORDER BY salary DESC
) AS prev_higher_earner,
LAG(salary, 1, 0) OVER (
PARTITION BY department ORDER BY salary DESC
) AS prev_salary,
salary - LAG(salary, 1, 0) OVER (
PARTITION BY department ORDER BY salary DESC
) AS diff_from_prev,
LEAD(employee_name, 1, 'N/A') OVER (
PARTITION BY department ORDER BY salary DESC
) AS next_lower_earner,
LEAD(salary, 1, 0) OVER (
PARTITION BY department ORDER BY salary DESC
) AS next_salary,
ROUND(
salary * 100.0 / SUM(salary) OVER (PARTITION BY department),
2
) AS pct_of_dept_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;
Output:
| employee_name | salary | prev_higher_earner | prev_salary | diff_from_prev | next_lower_earner | next_salary | pct_of_dept_total |
|---|---|---|---|---|---|---|---|
| Charlie Brown | 135000 | N/A | 0 | 135000 | Alice Johnson | 120000 | 23.32% |
| Alice Johnson | 120000 | Charlie Brown | 135000 | -15000 | Eve Davis | 110000 | 20.75% |
| Eve Davis | 110000 | Alice Johnson | 120000 | -10000 | Bob Smith | 95000 | 19.03% |
| Bob Smith | 95000 | Eve Davis | 110000 | -15000 | Diana Ross | 88000 | 16.43% |
| Diana Ross | 88000 | Bob Smith | 95000 | -7000 | N/A | 0 | 15.21% |
π Part 4: NTILE for Data Distribution
-- Divide employees into quartiles based on salary
SELECT
employee_name,
department,
salary,
NTILE(4) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_quartile,
CASE
WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 1
THEN 'Top 25%'
WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 2
THEN 'Second 25%'
WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 3
THEN 'Third 25%'
ELSE 'Bottom 25%'
END AS quartile_label
FROM employees
ORDER BY department, salary DESC;
β οΈβ οΈ NTILE Edge Case
NTILE distributes rows as evenly as possible. When the number of rows isn't evenly divisible by the number of buckets, the first few buckets get the extra rows. For example, 5 rows into 4 buckets: bucket 1 has 2 rows, buckets 2-4 have 1 each.
π Part 5: Running Totals and Cumulative Calculations
-- Running total and cumulative average within each department
SELECT
employee_name,
department,
salary,
hire_date,
SUM(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
AVG(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_avg,
COUNT(*) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_count,
FIRST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY hire_date
) AS first_hired,
LAST_VALUE(employee_name) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_hired
FROM employees
ORDER BY department, hire_date;
π― Part 6: Advanced Real-World Scenarios
Scenario: Salary Band Analysis
-- Calculate moving average salary (3-month window)
-- and identify salary outliers within department
WITH salary_stats AS (
SELECT
employee_name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
STDDEV(salary) OVER (PARTITION BY department) AS dept_stddev,
MIN(salary) OVER (PARTITION BY department) AS dept_min,
MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees
)
SELECT
employee_name,
department,
salary,
ROUND(dept_avg, 2) AS dept_avg,
ROUND(salary - dept_avg, 2) AS deviation_from_avg,
CASE
WHEN salary > dept_avg + 2 * dept_stddev THEN 'High Outlier'
WHEN salary < dept_avg - 2 * dept_stddev THEN 'Low Outlier'
WHEN salary > dept_avg + dept_stddev THEN 'Above Average'
WHEN salary < dept_avg - dept_stddev THEN 'Below Average'
ELSE 'Average'
END AS salary_category,
ROUND((salary - dept_min) * 100.0 / NULLIF(dept_max - dept_min, 0), 2) AS percentile_position
FROM salary_stats
ORDER BY department, salary DESC;
Scenario: Year-over-Year Growth
-- Calculate salary growth for employees with multiple records
-- (Simulated with hire dates representing salary changes)
WITH salary_changes AS (
SELECT
employee_name,
salary,
hire_date,
LAG(salary) OVER (
PARTITION BY employee_name ORDER BY hire_date
) AS prev_salary,
LAG(hire_date) OVER (
PARTITION BY employee_name ORDER BY hire_date
) AS prev_date
FROM employees
)
SELECT
employee_name,
salary AS current_salary,
prev_salary,
salary - prev_salary AS absolute_change,
ROUND(
(salary - prev_salary) * 100.0 / NULLIF(prev_salary, 0),
2
) AS pct_change,
EXTRACT(DAY FROM hire_date - prev_date) AS days_between_changes
FROM salary_changes
WHERE prev_salary IS NOT NULL
ORDER BY pct_change DESC;
π§ͺ Part 7: Window Frames Deep Dive
π‘π‘ Window Frame Syntax
Understanding frame clauses is crucial for advanced window function usage:
ROWS BETWEEN n PRECEDING AND m FOLLOWINGβ Physical rowsRANGE BETWEEN n PRECEDING AND m FOLLOWINGβ Logical value rangeGROUPS BETWEEN n PRECEDING AND m FOLLOWINGβ Groups of ties (PostgreSQL 11+)
-- Different window frame behaviors
SELECT
employee_name,
department,
salary,
-- Current row only
SUM(salary) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN CURRENT ROW AND CURRENT ROW
) AS current_only,
-- 1 row before and after
SUM(salary) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_3,
-- All rows up to current
SUM(salary) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative,
-- All rows in partition
SUM(salary) OVER (
PARTITION BY department
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS partition_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;
β±οΈ Complexity Analysis
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Simple Window Function | O(n log n) | O(n) | Sort required for ORDER BY |
| Partition + Rank | O(n log n) | O(n) | Per-partition sorting |
| LAG/LEAD | O(n) | O(1) | Sequential access only |
| NTILE | O(n log n) | O(n) | Requires full sort |
| Running Total | O(n log n) | O(n) | Cumulative computation |
π― Quiz Section
π Best Practices for Interviews
π‘β Best Practices
1. Always Specify Window Frame:
-- BAD: Implicit frame (might not be what you want)
SUM(salary) OVER (ORDER BY hire_date)
-- GOOD: Explicit frame
SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
2. Break Ties Deterministically:
-- BAD: Non-deterministic ordering
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)
-- GOOD: Deterministic with secondary sort
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC, employee_id)
3. Use CTEs for Readability:
-- BAD: Nested subqueries
SELECT * FROM (
SELECT *, RANK() OVER (...) AS rn FROM (
SELECT * FROM employees
) sub
) sub2 WHERE rn = 1
-- GOOD: CTE with clear steps
WITH ranked AS (
SELECT *, RANK() OVER (...) AS rn FROM employees
)
SELECT * FROM ranked WHERE rn = 1
4. Performance Consideration:
- Window functions with ORDER BY require sorting: O(n log n)
- PARTITION BY without ORDER BY is cheaper
- Avoid multiple window functions with different ORDER BY on same partition
π Common Follow-Up Questions
- "How would you handle NULLs in LAG/LEAD?" β Use COALESCE with the default parameter
- "Can you use window functions in WHERE?" β No, use a CTE or subquery first
- "What's the difference between ROW_NUMBER and a sequential ID?" β ROW_NUMBER resets per partition; IDs don't
- "How do you optimize queries with multiple window functions?" β Combine partitions with same ORDER BY when possible
β οΈβ οΈ Critical Interview Note
Always clarify with the interviewer: Do they want ties broken? Should the ranking include or exclude certain rows? What should happen at partition boundaries? These questions show attention to detail and real-world thinking.