🎯 SQL Window Functions

Google & Meta Interview Deep Dive

🏢 Google🏢 Meta⚡ Difficulty: Hard⏱️ 45 min

📋 Interview Question

ℹ️🔴 Google/Meta Interview Question

"Given a table of employee salaries with department information, write a query to rank employees within each department by salary. Then find the top 3 earners per department. Additionally, calculate each employee's salary as a percentage of their department's total and the difference from the previous and next employee's salary using LAG and LEAD."

Companies: Google, Meta | Difficulty: Hard | Time: 45 minutes

📊 Setup: Sample Data

Let's create a realistic dataset to work with:

-- Create tables
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    employee_name VARCHAR(100),
    department VARCHAR(50),
    salary DECIMAL(10, 2),
    hire_date DATE
);

-- Insert sample data
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Alice Johnson', 'Engineering', 120000.00, '2020-01-15'),
('Bob Smith', 'Engineering', 95000.00, '2019-06-20'),
('Charlie Brown', 'Engineering', 135000.00, '2018-03-10'),
('Diana Ross', 'Engineering', 88000.00, '2021-09-01'),
('Eve Davis', 'Engineering', 110000.00, '2020-07-14'),
('Frank Wilson', 'Marketing', 78000.00, '2019-11-30'),
('Grace Lee', 'Marketing', 92000.00, '2020-04-22'),
('Hank Miller', 'Marketing', 65000.00, '2021-01-15'),
('Ivy Chen', 'Marketing', 105000.00, '2018-08-05'),
('Jack Taylor', 'Marketing', 72000.00, '2020-12-01'),
('Karen White', 'Data Science', 130000.00, '2019-02-14'),
('Leo Martin', 'Data Science', 115000.00, '2020-05-30'),
('Mia Garcia', 'Data Science', 140000.00, '2018-11-20'),
('Noah Clark', 'Data Science', 98000.00, '2021-03-10'),
('Olivia Lewis', 'Data Science', 108000.00, '2020-08-25');

🔢 Part 1: ROW_NUMBER, RANK, and DENSE_RANK

The Core Problem

⚠️⚠️ Common Interview Pitfall

Interviewers often ask: "What's the difference between RANK and DENSE_RANK?" The key is handling ties:

ROW_NUMBER() — No ties, always unique sequential numbers
RANK() — Ties get same number, gaps in sequence (1, 1, 3)
DENSE_RANK() — Ties get same number, no gaps (1, 1, 2)

-- ROW_NUMBER: Assigns unique sequential numbers
SELECT
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS row_num,
    RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS rank_num,
    DENSE_RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS dense_rank_num
FROM employees
ORDER BY department, salary DESC;

Output:

employee_name	department	salary	row_num	rank_num	dense_rank_num
Charlie Brown	Engineering	135000	1	1	1
Alice Johnson	Engineering	120000	2	2	2
Eve Davis	Engineering	110000	3	3	3
Bob Smith	Engineering	95000	4	4	4
Diana Ross	Engineering	88000	5	5	5
Ivy Chen	Marketing	105000	1	1	1
Grace Lee	Marketing	92000	2	2	2
Frank Wilson	Marketing	78000	3	3	3
Jack Taylor	Marketing	72000	4	4	4
Hank Miller	Marketing	65000	5	5	5
Mia Garcia	Data Science	140000	1	1	1
Karen White	Data Science	130000	2	2	2
Leo Martin	Data Science	115000	3	3	3
Olivia Lewis	Data Science	108000	4	4	4
Noah Clark	Data Science	98000	5	5	5

Handling Ties: When Rankings Are Equal

-- Adding duplicate salaries to demonstrate tie handling
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Zara Adams', 'Engineering', 120000.00, '2021-05-10'),  -- Same as Alice
('Yuki Tanaka', 'Engineering', 120000.00, '2020-03-15');  -- Same as Alice

-- See the difference in rankings
SELECT
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS row_number,
    RANK() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS rank_val,
    DENSE_RANK() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS dense_rank_val
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

Key insight for interviews: ROW_NUMBER() will arbitrarily assign different numbers to ties (deterministic only if you add a secondary ORDER BY column), while RANK() creates gaps and DENSE_RANK() does not.

📈 Part 2: Finding Top N Per Group

💡💡 Pro Tip for Interviews

The most efficient way to find top N per group uses a subquery with DENSE_RANK or ROW_NUMBER in the WHERE clause. Always consider edge cases: what if there are fewer than N rows in a group?

-- Top 3 highest earners per department
WITH ranked_employees AS (
    SELECT
        employee_id,
        employee_name,
        department,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) AS salary_rank
    FROM employees
)
SELECT
    employee_id,
    employee_name,
    department,
    salary,
    salary_rank
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank;

Alternative: Using ROW_NUMBER for strictly unique results:

-- If you want exactly 3 rows per department (even with ties)
SELECT
    employee_id,
    employee_name,
    department,
    salary
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY department
            ORDER BY salary DESC, employee_id  -- Break ties deterministically
        ) AS rn
    FROM employees
) sub
WHERE rn <= 3
ORDER BY department, rn;

📊 Part 3: LAG and LEAD Functions

ℹ️🔍 LAG and LEAD Explained

LAG(column, offset, default) — Access previous row's value
LEAD(column, offset, default) — Access next row's value
Third parameter provides a default when no previous/next row exists

-- Calculate salary difference from previous and next employee
-- Also compute running department total percentage
SELECT
    employee_name,
    department,
    salary,
    LAG(employee_name, 1, 'N/A') OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS prev_higher_earner,
    LAG(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS prev_salary,
    salary - LAG(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS diff_from_prev,
    LEAD(employee_name, 1, 'N/A') OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS next_lower_earner,
    LEAD(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS next_salary,
    ROUND(
        salary * 100.0 / SUM(salary) OVER (PARTITION BY department),
        2
    ) AS pct_of_dept_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

Output:

employee_name	salary	prev_higher_earner	prev_salary	diff_from_prev	next_lower_earner	next_salary	pct_of_dept_total
Charlie Brown	135000	N/A	0	135000	Alice Johnson	120000	23.32%
Alice Johnson	120000	Charlie Brown	135000	-15000	Eve Davis	110000	20.75%
Eve Davis	110000	Alice Johnson	120000	-10000	Bob Smith	95000	19.03%
Bob Smith	95000	Eve Davis	110000	-15000	Diana Ross	88000	16.43%
Diana Ross	88000	Bob Smith	95000	-7000	N/A	0	15.21%

🏃 Part 4: NTILE for Data Distribution

-- Divide employees into quartiles based on salary
SELECT
    employee_name,
    department,
    salary,
    NTILE(4) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS salary_quartile,
    CASE
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 1
            THEN 'Top 25%'
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 2
            THEN 'Second 25%'
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 3
            THEN 'Third 25%'
        ELSE 'Bottom 25%'
    END AS quartile_label
FROM employees
ORDER BY department, salary DESC;

⚠️⚠️ NTILE Edge Case

NTILE distributes rows as evenly as possible. When the number of rows isn't evenly divisible by the number of buckets, the first few buckets get the extra rows. For example, 5 rows into 4 buckets: bucket 1 has 2 rows, buckets 2-4 have 1 each.

🔗 Part 5: Running Totals and Cumulative Calculations

-- Running total and cumulative average within each department
SELECT
    employee_name,
    department,
    salary,
    hire_date,
    SUM(salary) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    AVG(salary) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_avg,
    COUNT(*) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_count,
    FIRST_VALUE(employee_name) OVER (
        PARTITION BY department
        ORDER BY hire_date
    ) AS first_hired,
    LAST_VALUE(employee_name) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_hired
FROM employees
ORDER BY department, hire_date;

🎯 Part 6: Advanced Real-World Scenarios

Scenario: Salary Band Analysis

-- Calculate moving average salary (3-month window)
-- and identify salary outliers within department
WITH salary_stats AS (
    SELECT
        employee_name,
        department,
        salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg,
        STDDEV(salary) OVER (PARTITION BY department) AS dept_stddev,
        MIN(salary) OVER (PARTITION BY department) AS dept_min,
        MAX(salary) OVER (PARTITION BY department) AS dept_max
    FROM employees
)
SELECT
    employee_name,
    department,
    salary,
    ROUND(dept_avg, 2) AS dept_avg,
    ROUND(salary - dept_avg, 2) AS deviation_from_avg,
    CASE
        WHEN salary > dept_avg + 2 * dept_stddev THEN 'High Outlier'
        WHEN salary < dept_avg - 2 * dept_stddev THEN 'Low Outlier'
        WHEN salary > dept_avg + dept_stddev THEN 'Above Average'
        WHEN salary < dept_avg - dept_stddev THEN 'Below Average'
        ELSE 'Average'
    END AS salary_category,
    ROUND((salary - dept_min) * 100.0 / NULLIF(dept_max - dept_min, 0), 2) AS percentile_position
FROM salary_stats
ORDER BY department, salary DESC;

Scenario: Year-over-Year Growth

-- Calculate salary growth for employees with multiple records
-- (Simulated with hire dates representing salary changes)
WITH salary_changes AS (
    SELECT
        employee_name,
        salary,
        hire_date,
        LAG(salary) OVER (
            PARTITION BY employee_name ORDER BY hire_date
        ) AS prev_salary,
        LAG(hire_date) OVER (
            PARTITION BY employee_name ORDER BY hire_date
        ) AS prev_date
    FROM employees
)
SELECT
    employee_name,
    salary AS current_salary,
    prev_salary,
    salary - prev_salary AS absolute_change,
    ROUND(
        (salary - prev_salary) * 100.0 / NULLIF(prev_salary, 0),
        2
    ) AS pct_change,
    EXTRACT(DAY FROM hire_date - prev_date) AS days_between_changes
FROM salary_changes
WHERE prev_salary IS NOT NULL
ORDER BY pct_change DESC;

🧪 Part 7: Window Frames Deep Dive

💡💡 Window Frame Syntax

Understanding frame clauses is crucial for advanced window function usage:

ROWS BETWEEN n PRECEDING AND m FOLLOWING — Physical rows
RANGE BETWEEN n PRECEDING AND m FOLLOWING — Logical value range
GROUPS BETWEEN n PRECEDING AND m FOLLOWING — Groups of ties (PostgreSQL 11+)

-- Different window frame behaviors
SELECT
    employee_name,
    department,
    salary,
    -- Current row only
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS current_only,
    -- 1 row before and after
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_3,
    -- All rows up to current
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative,
    -- All rows in partition
    SUM(salary) OVER (
        PARTITION BY department
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS partition_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

⏱️ Complexity Analysis

Operation	Time Complexity	Space Complexity	Notes
Simple Window Function	O(n log n)	O(n)	Sort required for ORDER BY
Partition + Rank	O(n log n)	O(n)	Per-partition sorting
LAG/LEAD	O(n)	O(1)	Sequential access only
NTILE	O(n log n)	O(n)	Requires full sort
Running Total	O(n log n)	O(n)	Cumulative computation

🎯 Quiz Section

🏆 Best Practices for Interviews

💡✅ Best Practices

1. Always Specify Window Frame:

-- BAD: Implicit frame (might not be what you want)
SUM(salary) OVER (ORDER BY hire_date)

-- GOOD: Explicit frame
SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

2. Break Ties Deterministically:

-- BAD: Non-deterministic ordering
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)

-- GOOD: Deterministic with secondary sort
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC, employee_id)

3. Use CTEs for Readability:

-- BAD: Nested subqueries
SELECT * FROM (
    SELECT *, RANK() OVER (...) AS rn FROM (
        SELECT * FROM employees
    ) sub
) sub2 WHERE rn = 1

-- GOOD: CTE with clear steps
WITH ranked AS (
    SELECT *, RANK() OVER (...) AS rn FROM employees
)
SELECT * FROM ranked WHERE rn = 1

4. Performance Consideration:

Window functions with ORDER BY require sorting: O(n log n)
PARTITION BY without ORDER BY is cheaper
Avoid multiple window functions with different ORDER BY on same partition

🔗 Common Follow-Up Questions

"How would you handle NULLs in LAG/LEAD?" — Use COALESCE with the default parameter
"Can you use window functions in WHERE?" — No, use a CTE or subquery first
"What's the difference between ROW_NUMBER and a sequential ID?" — ROW_NUMBER resets per partition; IDs don't
"How do you optimize queries with multiple window functions?" — Combine partitions with same ORDER BY when possible

⚠️⚠️ Critical Interview Note

Always clarify with the interviewer: Do they want ties broken? Should the ranking include or exclude certain rows? What should happen at partition boundaries? These questions show attention to detail and real-world thinking.

Topic: SQL Window Functions Mastery for FAANG Interviews