Interview Question: "Explain the difference between ROWS, RANGE, and GROUPS frame types. When would you use NTILE versus ROW_NUMBER? How does EXCLUDE CURRENT ROW differ from EXCLUDE TIES?" — Asked at Google, Meta, Amazon for Senior Data Engineer roles
ℹ️
Difficulty: Advanced | Companies: Google, Meta, Amazon, Apple, Microsoft | Time: 45-60 minutes
Understanding Window Frame Precedence
The SQL standard defines a precise order of operations for window functions:
Window functions operate between GROUP BY/HAVING and ORDER BY:
The frame clause defines which rows participate in the computation:
NTILE Distribution Analysis
NTILE distributes rows into a specified number of roughly equal groups:
Where is total rows and is number of buckets.
-- Create sample sales data
CREATE TABLE daily_sales (
sale_date DATE,
region VARCHAR(50),
revenue DECIMAL(12,2)
);
INSERT INTO daily_sales VALUES
('2024-01-01', 'North', 15000.00),
('2024-01-01', 'South', 22000.00),
('2024-01-02', 'North', 18000.00),
('2024-01-02', 'South', 19500.00),
('2024-01-03', 'North', 21000.00),
('2024-01-03', 'South', 25000.00),
('2024-01-04', 'North', 16500.00),
('2024-01-04', 'South', 20000.00),
('2024-01-05', 'North', 23000.00),
('2024-01-05', 'South', 28000.00);
-- NTILE: Divide into 4 quartiles by revenue
SELECT
sale_date,
region,
revenue,
NTILE(4) OVER (ORDER BY revenue) AS quartile,
NTILE(10) OVER (ORDER BY revenue) AS decile,
-- Custom bucket calculation
CASE
WHEN NTILE(4) OVER (ORDER BY revenue) = 1 THEN 'Bottom 25%'
WHEN NTILE(4) OVER (ORDER BY revenue) = 2 THEN 'Q2'
WHEN NTILE(4) OVER (ORDER BY revenue) = 3 THEN 'Q3'
ELSE 'Top 25%'
END AS revenue_tier
FROM daily_sales
ORDER BY revenue;
Output:
| sale_date | region | revenue | quartile | decile | revenue_tier |
|---|---|---|---|---|---|
| 2024-01-01 | North | 15000.00 | 1 | 1 | Bottom 25% |
| 2024-01-04 | North | 16500.00 | 1 | 2 | Bottom 25% |
| 2024-01-02 | North | 18000.00 | 2 | 3 | Q2 |
| 2024-01-02 | South | 19500.00 | 2 | 4 | Q2 |
| 2024-01-04 | South | 20000.00 | 3 | 5 | Q3 |
| 2024-01-03 | North | 21000.00 | 3 | 6 | Q3 |
| 2024-01-01 | South | 22000.00 | 3 | 7 | Q3 |
| 2024-01-05 | North | 23000.00 | 4 | 8 | Top 25% |
| 2024-01-03 | South | 25000.00 | 4 | 9 | Top 25% |
| 2024-01-05 | South | 28000.00 | 4 | 10 | Top 25% |
⚠️
Performance Tip: NTILE requires the ORDER BY clause. Without it, the window is non-deterministic. The database must sort the entire result set before computing NTILE, which can be expensive for large datasets.
LAG/LEAD with Offset and Default
LAG and LEAD access values from preceding or following rows:
-- LAG/LEAD with multiple offsets and defaults
SELECT
sale_date,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS prev_day,
LAG(revenue, 2, 0) OVER (ORDER BY sale_date) AS prev_2_days,
LEAD(revenue, 1, 0) OVER (ORDER BY sale_date) AS next_day,
-- Calculate day-over-day change
revenue - LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS daily_change,
-- Percentage change with null handling
ROUND(
(revenue - LAG(revenue, 1, NULL) OVER (ORDER BY sale_date)) * 100.0 /
NULLIF(LAG(revenue, 1, NULL) OVER (ORDER BY sale_date), 0),
2
) AS pct_change,
-- Running comparison
CASE
WHEN revenue > LAG(revenue, 1, 0) OVER (ORDER BY sale_date) THEN 'UP'
WHEN revenue < LAG(revenue, 1, 0) OVER (ORDER BY sale_date) THEN 'DOWN'
ELSE 'FLAT'
END AS trend
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;
Output:
| sale_date | revenue | prev_day | prev_2_days | next_day | daily_change | pct_change | trend |
|---|---|---|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 0 | 0 | 18000.00 | 15000.00 | NULL | UP |
| 2024-01-02 | 18000.00 | 15000.00 | 0 | 21000.00 | 3000.00 | 20.00 | UP |
| 2024-01-03 | 21000.00 | 18000.00 | 15000.00 | 16500.00 | 3000.00 | 16.67 | UP |
| 2024-01-04 | 16500.00 | 21000.00 | 18000.00 | 23000.00 | -4500.00 | -21.43 | DOWN |
| 2024-01-05 | 23000.00 | 16500.00 | 21000.00 | NULL | 6500.00 | 39.39 | UP |
RANGE vs ROWS vs GROUPS
The frame type determines which rows are included in the window:
-- Demonstrate RANGE vs ROWS behavior
SELECT
sale_date,
revenue,
-- ROWS frame: physical rows
SUM(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sum_rows,
-- RANGE frame: logical values (days within range)
SUM(revenue) OVER (
ORDER BY sale_date
RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
) AS sum_range,
-- GROUPS frame: peer groups
SUM(revenue) OVER (
ORDER BY revenue
GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS sum_groups
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;
Output:
| sale_date | revenue | sum_rows | sum_range | sum_groups |
|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 15000.00 | 15000.00 | 15000.00 |
| 2024-01-02 | 18000.00 | 33000.00 | 33000.00 | 33000.00 |
| 2024-01-03 | 21000.00 | 54000.00 | 54000.00 | 54000.00 |
| 2024-01-04 | 16500.00 | 55500.00 | 70500.00 | 55500.00 |
| 2024-01-05 | 23000.00 | 60500.00 | 93500.00 | 60500.00 |
ℹ️
Key Distinction: ROWS counts physical rows, RANGE includes all rows with values within the specified logical range, and GROUPS includes peer groups (rows with the same ORDER BY value).
EXCLUDE Clause Variations
The EXCLUDE clause removes specific rows from the frame:
-- EXCLUDE variations
SELECT
sale_date,
revenue,
-- EXCLUDE CURRENT ROW
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW
) AS avg_excl_current,
-- EXCLUDE GROUP (excludes all peers of current row)
AVG(revenue) OVER (
ORDER BY revenue
GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE GROUP
) AS avg_excl_group,
-- EXCLUDE TIES (excludes peers but keeps current)
AVG(revenue) OVER (
ORDER BY revenue
GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE TIES
) AS avg_excl_ties,
-- EXCLUDE NO OTHERS (default, no exclusion)
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE NO OTHERS
) AS avg_no_exclusion
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;
Output:
| sale_date | revenue | avg_excl_current | avg_excl_group | avg_excl_ties | avg_no_exclusion |
|---|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 19625.00 | 22100.00 | 19625.00 | 19625.00 |
| 2024-01-02 | 18000.00 | 18375.00 | 22100.00 | 19625.00 | 19625.00 |
| 2024-01-03 | 21000.00 | 18375.00 | 18700.00 | 19625.00 | 19625.00 |
| 2024-01-04 | 16500.00 | 19250.00 | 22100.00 | 19625.00 | 19625.00 |
| 2024-01-05 | 23000.00 | 18375.00 | 18700.00 | 19625.00 | 19625.00 |
Advanced Pattern: Percentile with Weighted Values
-- Weighted percentile calculation
WITH weighted_data AS (
SELECT
sale_date,
revenue,
revenue * 0.1 AS weight -- Example weight
FROM daily_sales
WHERE region = 'North'
)
SELECT
sale_date,
revenue,
SUM(weight) OVER (
ORDER BY revenue
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_weight,
SUM(weight) OVER () AS total_weight,
ROUND(
SUM(weight) OVER (
ORDER BY revenue
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) * 100.0 / SUM(weight) OVER (),
2
) AS weighted_percentile
FROM weighted_data
ORDER BY revenue;
Output:
| sale_date | revenue | cumulative_weight | total_weight | weighted_percentile |
|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 1500.00 | 9850.00 | 15.23 |
| 2024-01-04 | 16500.00 | 3150.00 | 9850.00 | 31.98 |
| 2024-01-02 | 18000.00 | 4950.00 | 9850.00 | 50.25 |
| 2024-01-03 | 21000.00 | 7050.00 | 9850.00 | 71.57 |
| 2024-01-05 | 23000.00 | 9850.00 | 9850.00 | 100.00 |
Query Execution Plan
When analyzing window function execution plans:
- Sort Operation: Window functions require sorting by the ORDER BY clause
- Window Aggregate: The database computes aggregates over the frame
- Frame Scan: ROWS scans N rows, RANGE scans by value range
- Materialization: Results are often materialized for multi-pass aggregation
-- Check execution plan
EXPLAIN ANALYZE
SELECT
sale_date,
revenue,
NTILE(4) OVER (ORDER BY revenue) AS quartile,
SUM(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_sum
FROM daily_sales;
⚠️
Common Pitfall: Using RANGE with date intervals can include unexpected rows if there are gaps in the data. Always verify your frame boundaries with EXPLAIN ANALYZE.
Mathematical Properties
Window functions preserve the following properties:
- Idempotency: for fixed frames
- Distributivity:
- Commutativity: Order doesn't matter for SUM, COUNT
- Associativity: Grouping doesn't affect result
The computational complexity is:
Space complexity:
ℹ️
Advanced Technique: Combine NTILE with LAG to detect distribution shifts over time. This pattern is useful for anomaly detection in time series data.