🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Window Aggregates: SUM OVER, Running Totals, Moving Averages

Advanced SQLWindow Aggregates⭐ Premium

Advertisement

Interview Question: "Explain the difference between running total and moving average. How do you calculate year-over-year growth using window functions? What are the performance implications?" — Asked at Tableau, Looker, Databricks for Analytics roles

ℹ️

Difficulty: Advanced | Companies: Tableau, Looker, Databricks, Snowflake, Google BigQuery | Time: 45-60 minutes

Running Totals

Running Totali=j=1ixj\text{Running Total}_i = \sum_{j=1}^{i} x_j
-- Create sales data
CREATE TABLE daily_sales (
    sale_date DATE PRIMARY KEY,
    revenue DECIMAL(12,2),
    units_sold INT
);

INSERT INTO daily_sales VALUES
('2024-01-01', 15000.00, 150),
('2024-01-02', 18000.00, 180),
('2024-01-03', 16500.00, 165),
('2024-01-04', 21000.00, 210),
('2024-01-05', 19500.00, 195),
('2024-01-08', 22000.00, 220),
('2024-01-09', 24000.00, 240),
('2024-01-10', 20500.00, 205),
('2024-01-11', 25500.00, 255),
('2024-01-12', 27000.00, 270);

-- Running total
SELECT 
    sale_date,
    revenue,
    SUM(revenue) OVER (ORDER BY sale_date) AS running_total,
    SUM(units_sold) OVER (ORDER BY sale_date) AS running_units,
    -- Running average
    AVG(revenue) OVER (ORDER BY sale_date) AS running_avg
FROM daily_sales
ORDER BY sale_date;

Output:

sale_daterevenuerunning_totalrunning_unitsrunning_avg
2024-01-0115000.0015000.0015015000.00
2024-01-0218000.0033000.0033016500.00
2024-01-0316500.0049500.0049516500.00
2024-01-0421000.0070500.0070517625.00
2024-01-0519500.0090000.0090018000.00
2024-01-0822000.00112000.00112018666.67
2024-01-0924000.00136000.00136019428.57
2024-01-1020500.00156500.00156519562.50
2024-01-1125500.00182000.00182020222.22
2024-01-1227000.00209000.00209020900.00

Moving Averages

SMAk=1ki=0k1xni\text{SMA}_k = \frac{1}{k} \sum_{i=0}^{k-1} x_{n-i}
EMAn=αxn+(1α)EMAn1\text{EMA}_n = \alpha \cdot x_n + (1-\alpha) \cdot \text{EMA}_{n-1}
-- Simple Moving Average (SMA)
SELECT 
    sale_date,
    revenue,
    -- 3-day SMA
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sma_3day,
    -- 5-day SMA
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    ) AS sma_5day,
    -- Weighted Moving Average
    SUM(revenue * weight) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) / SUM(weight) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS wma_3day
FROM daily_sales
ORDER BY sale_date;

-- Exponential Moving Average (EMA)
WITH ema_calc AS (
    SELECT 
        sale_date,
        revenue,
        0.3 AS alpha,  -- Smoothing factor
        LAG(revenue) OVER (ORDER BY sale_date) AS prev_revenue
    FROM daily_sales
)
SELECT 
    sale_date,
    revenue,
    CASE 
        WHEN prev_revenue IS NULL THEN revenue
        ELSE alpha * revenue + (1 - alpha) * prev_revenue
    END AS ema
FROM ema_calc
ORDER BY sale_date;

Cumulative Distribution

CDF(x)=rank(x)n\text{CDF}(x) = \frac{\text{rank}(x)}{n}
-- Cumulative distribution
SELECT 
    sale_date,
    revenue,
    CUME_DIST() OVER (ORDER BY revenue) AS cumulative_dist,
    PERCENT_RANK() OVER (ORDER BY revenue) AS percentile_rank,
    NTILE(4) OVER (ORDER BY revenue) AS quartile,
    NTILE(10) OVER (ORDER BY revenue) AS decile
FROM daily_sales
ORDER BY sale_date;

Output:

sale_daterevenuecumulative_distpercentile_rankquartiledecile
2024-01-0115000.000.100.0011
2024-01-0218000.000.200.1112
2024-01-0316500.000.300.2223
2024-01-0421000.000.400.3324
2024-01-0519500.000.500.4435
2024-01-0822000.000.600.5536
2024-01-0924000.000.700.6647
2024-01-1020500.000.800.7748
2024-01-1125500.000.900.8849
2024-01-1227000.001.001.00410

Year-over-Year Growth

-- Year-over-year growth
SELECT 
    sale_date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY sale_date) AS prev_day,
    LAG(revenue, 7) OVER (ORDER BY sale_date) AS prev_week,
    LAG(revenue, 365) OVER (ORDER BY sale_date) AS prev_year,
    -- Day-over-day growth
    ROUND(
        (revenue - LAG(revenue, 1) OVER (ORDER BY sale_date)) * 100.0 /
        NULLIF(LAG(revenue, 1) OVER (ORDER BY sale_date), 0), 2
    ) AS dod_growth,
    -- Week-over-week growth
    ROUND(
        (revenue - LAG(revenue, 7) OVER (ORDER BY sale_date)) * 100.0 /
        NULLIF(LAG(revenue, 7) OVER (ORDER BY sale_date), 0), 2
    ) AS wow_growth
FROM daily_sales
ORDER BY sale_date;

Advanced Window Aggregates

-- Standard deviation and variance
SELECT 
    sale_date,
    revenue,
    AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg,
    STDDEV(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_stddev,
    VARIANCE(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_variance,
    -- Coefficient of variation
    ROUND(
        STDDEV(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 100.0 /
        NULLIF(AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0),
        2
    ) AS cv_percent
FROM daily_sales
ORDER BY sale_date;

-- First and last values in window
SELECT 
    sale_date,
    revenue,
    FIRST_VALUE(revenue) OVER (ORDER BY sale_date) AS first_value,
    LAST_VALUE(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_value,
    NTH_VALUE(revenue, 3) OVER (ORDER BY sale_date) AS third_value
FROM daily_sales
ORDER BY sale_date;

Performance Considerations

OperationTime ComplexitySpace Complexity
Running TotalO(n)O(n)O(1)O(1)
Moving AverageO(nk)O(n \cdot k)O(k)O(k)
Cumulative DistO(nlogn)O(n \log n)O(n)O(n)
YoY GrowthO(n)O(n)O(1)O(1)
-- Optimize window functions
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    sale_date,
    SUM(revenue) OVER (ORDER BY sale_date) AS running_total
FROM daily_sales;

-- Check for sort operations
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    sale_date,
    AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sma
FROM daily_sales;

ℹ️

Pro Tip: Window functions are computed after WHERE but before ORDER BY. Use CTEs to break complex calculations into readable steps.

Mathematical Properties

For running total RnR_n:

Rn=Rn1+xnR_n = R_{n-1} + x_n

For exponential moving average:

EMAn=αxn+(1α)EMAn1\text{EMA}_n = \alpha x_n + (1-\alpha) \text{EMA}_{n-1}

The smoothing factor α\alpha controls responsiveness:

α=2N+1where N=period\alpha = \frac{2}{N+1} \quad \text{where } N = \text{period}

Cumulative Distribution Functions

-- CDF and percentiles
SELECT 
    sale_date,
    revenue,
    CUME_DIST() OVER (ORDER BY revenue) AS cume_dist,
    PERCENT_RANK() OVER (ORDER BY revenue) AS percent_rank,
    NTILE(4) OVER (ORDER BY revenue) AS quartile,
    NTILE(10) OVER (ORDER BY revenue) AS decile,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) OVER () AS median
FROM daily_sales
ORDER BY sale_date;

Output:

sale_daterevenuecume_distpercent_rankquartiledecilemedian
2024-01-0115000.000.100.001120250.00
2024-01-0218000.000.200.111220250.00
2024-01-0316500.000.300.222320250.00
2024-01-0421000.000.400.332420250.00
2024-01-0519500.000.500.443520250.00
2024-01-0822000.000.600.553620250.00
2024-01-0924000.000.700.664720250.00
2024-01-1020500.000.800.774820250.00
2024-01-1125500.000.900.884920250.00
2024-01-1227000.001.001.0041020250.00

Moving Percentiles

-- 7-day moving percentile
SELECT 
    sale_date,
    revenue,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_q1,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_q3,
    PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_p90
FROM daily_sales
ORDER BY sale_date;

Z-Score and Outlier Detection

-- Calculate z-scores using window functions
WITH stats AS (
    SELECT 
        sale_date,
        revenue,
        AVG(revenue) OVER () AS mean_revenue,
        STDDEV(revenue) OVER () AS stddev_revenue
    FROM daily_sales
)
SELECT 
    sale_date,
    revenue,
    ROUND(mean_revenue, 2) AS mean,
    ROUND(stddev_revenue, 2) AS stddev,
    ROUND((revenue - mean_revenue) / NULLIF(stddev_revenue, 0), 4) AS z_score,
    CASE 
        WHEN ABS((revenue - mean_revenue) / NULLIF(stddev_revenue, 0)) > 2 THEN 'OUTLIER'
        WHEN ABS((revenue - mean_revenue) / NULLIF(stddev_revenue, 0)) > 1.5 THEN 'UNUSUAL'
        ELSE 'NORMAL'
    END AS classification
FROM stats
ORDER BY sale_date;

Output:

sale_daterevenuemeanstddevz_scoreclassification
2024-01-0115000.0020900.004178.51-1.4117NORMAL
2024-01-0218000.0020900.004178.51-0.6940NORMAL
2024-01-0316500.0020900.004178.51-1.0529NORMAL
2024-01-0421000.0020900.004178.510.0239NORMAL
2024-01-0519500.0020900.004178.51-0.3350NORMAL
2024-01-0822000.0020900.004178.510.2633NORMAL
2024-01-0924000.0020900.004178.510.7419NORMAL
2024-01-1020500.0020900.004178.51-0.0957NORMAL
2024-01-1125500.0020900.004178.511.1009NORMAL
2024-01-1227000.0020900.004178.511.4599NORMAL

Performance Analysis

-- Compare window function performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    sale_date,
    SUM(revenue) OVER (ORDER BY sale_date) AS running_total,
    AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sma_3,
    LAG(revenue, 1) OVER (ORDER BY sale_date) AS prev_value
FROM daily_sales;

-- Check sort operations
EXPLAIN (ANALYZE, BUFFERS)
SELECT 
    sale_date,
    revenue,
    RANK() OVER (ORDER BY revenue DESC) AS rank
FROM daily_sales;

Window Function Optimization Tips

  1. Partition wisely: Use PARTITION BY to reduce window size
  2. Frame bounds: Use precise ROWS bounds instead of RANGE
  3. Index support: Ensure ORDER BY columns are indexed
  4. Avoid repeated sorts: Combine window functions with same ORDER BY
  5. Materialize CTEs: Use MATERIALIZED for complex window calculations

ℹ️

Performance Tip: Window functions are computed after WHERE but before ORDER BY. Use CTEs to break complex calculations into readable steps.

Mathematical Properties

For running total RnR_n:

Rn=Rn1+xnR_n = R_{n-1} + x_n

For exponential moving average:

EMAn=αxn+(1α)EMAn1\text{EMA}_n = \alpha x_n + (1-\alpha) \text{EMA}_{n-1}

The smoothing factor α\alpha controls responsiveness:

α=2N+1where N=period\alpha = \frac{2}{N+1} \quad \text{where } N = \text{period}

Time complexity for window operations:

T(n)=O(nlogn)(due to sorting)T(n) = O(n \log n) \quad \text{(due to sorting)}

Space complexity:

S(n)=O(n)(for materialized results)S(n) = O(n) \quad \text{(for materialized results)}

⚠️

Edge Cases: Window functions return NULL for rows where the frame is empty. Handle with COALESCE or CASE statements.

Advertisement