Window functions: ROW_NUMBER, LAG, LEAD, SUM OVER

Window functions are the feature you most regret not learning earlier. Every time I introduce them to a mid-career developer, I can see the exact moment they realize half their past code could have been five times shorter. They’re that impactful.

A window function computes a value for each row based on a “window” of related rows — without collapsing the result into aggregates the way GROUP BY does. Running totals, rankings, previous-row comparisons, moving averages, period-over-period calculations — all of these become one-line expressions instead of self-joins or correlated subqueries.

Today’s lesson is the single highest-leverage SQL lesson in the whole course. Read it twice.

The shape: aggregate function + OVER clause

A window function is any aggregate or ranking function followed by an OVER (...) clause:

SELECT OrderId,
       CountryCode,
       Total,
       SUM(Total) OVER (PARTITION BY CountryCode) AS country_revenue,
       ROW_NUMBER() OVER (PARTITION BY CountryCode ORDER BY Total DESC) AS rank_in_country
FROM Sales.Orders;

SUM(Total) OVER (PARTITION BY CountryCode) means: “for each row, compute the SUM of Total for rows with the same CountryCode as this row.” Each row keeps its own data AND gets the country-level sum. No collapse. No join.

The OVER clause has three optional pieces:

PARTITION BY <cols> — split rows into groups (independent windows).
ORDER BY <cols> — sort within each partition (matters for running totals and ROW_NUMBER).
ROWS BETWEEN ... AND ... — the frame, controls which rows of the partition the function sees.

Every combination of these three gives you a different window.

Ranking functions

`ROW_NUMBER()`

Assigns 1, 2, 3… to rows ordered within a partition.

-- Rank customers by LTV within their country
SELECT
    c.CustomerId,
    c.Name,
    c.Country,
    SUM(o.Total) AS ltv,
    ROW_NUMBER() OVER (
        PARTITION BY c.Country
        ORDER BY SUM(o.Total) DESC
    ) AS country_rank
FROM Sales.Customer c
JOIN Sales.Orders   o ON o.CustomerId = c.CustomerId
GROUP BY c.CustomerId, c.Name, c.Country;

Top 3 customers per country: wrap the above in a CTE and filter WHERE country_rank <= 3. This is the “top N per group” pattern we previewed in lesson 8.

`RANK()` and `DENSE_RANK()`

Same as ROW_NUMBER but handle ties differently:

RANK() — ties get the same rank, next row skips. 1, 2, 2, 4.
DENSE_RANK() — ties get the same rank, next row doesn’t skip. 1, 2, 2, 3.
ROW_NUMBER() — ties get arbitrary different ranks. 1, 2, 3, 4.

Use RANK when “two customers tied for 2nd place, next is 4th” is the semantics you want. ROW_NUMBER when you need unique, dense rank numbers for pagination or deduplication.

`NTILE(n)`

Splits the partition into n approximately-equal buckets:

-- Divide customers into 4 quartiles by LTV
SELECT CustomerId, ltv,
       NTILE(4) OVER (ORDER BY ltv DESC) AS ltv_quartile
FROM customer_ltv;

Quartile 1 is the top 25% by LTV, quartile 4 is the bottom 25%. Marketing uses this constantly for segmentation.

Offset functions: LAG and LEAD

LAG(col, n, default) returns the column from n rows earlier in the partition. LEAD is the mirror — from n rows later.

-- For each month, show revenue and the month-over-month change
WITH monthly AS (
    SELECT DATEFROMPARTS(YEAR(OrderDate), MONTH(OrderDate), 1) AS month_start,
           SUM(Total) AS revenue
    FROM Sales.Orders
    GROUP BY DATEFROMPARTS(YEAR(OrderDate), MONTH(OrderDate), 1)
)
SELECT month_start,
       revenue,
       LAG(revenue, 1) OVER (ORDER BY month_start)      AS prev_month,
       revenue - LAG(revenue, 1) OVER (ORDER BY month_start) AS delta,
       (revenue - LAG(revenue, 1) OVER (ORDER BY month_start))
           * 100.0 / NULLIF(LAG(revenue, 1) OVER (ORDER BY month_start), 0) AS mom_pct
FROM monthly
ORDER BY month_start;

Month-over-month growth. Before window functions this required joining the aggregated table against itself on an offset month. With LAG, one pass.

Practical pattern: “year-over-year” is LAG(revenue, 12) on monthly data. “Previous-quarter delta” is LAG(revenue, 1) on quarterly data. Same shape, different offsets.

Running totals and moving averages

Aggregate functions with ORDER BY in the OVER clause become cumulative by default:

-- Running total of revenue by day
SELECT CAST(OrderDate AS DATE) AS day,
       SUM(Total) OVER (ORDER BY CAST(OrderDate AS DATE)
                        ROWS UNBOUNDED PRECEDING) AS revenue_running
FROM Sales.Orders
ORDER BY day;

The ROWS UNBOUNDED PRECEDING tells SQL Server: “for each row, sum from the first row of the partition up to the current row.” This is the default when you use ORDER BY without an explicit frame, but I write it explicitly because it makes the intent obvious.

Moving average over the last 7 days:

SELECT day,
       daily_revenue,
       AVG(daily_revenue) OVER (
           ORDER BY day
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS moving_avg_7d
FROM daily_revenue;

“From 6 rows before to the current row” = a 7-day window. Your dashboard’s “7-day moving average” metric is literally this one line.

Frame specifications

The frame controls which rows the function sees. Four common patterns:

-- Running total (default with ORDER BY)
ROWS UNBOUNDED PRECEDING

-- Fixed-size trailing window
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

-- Centered window
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING

-- The entire partition (ignore order)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

You can also use RANGE BETWEEN instead of ROWS BETWEEN — RANGE is value-based (e.g., “all rows within 3 days of this one”) while ROWS is row-count-based. Subtle and slightly advanced; ROWS is what you want 90% of the time.

Running total with reset: the “cumulative by category” pattern

-- Running total of revenue per customer, ordered by their order history
SELECT o.CustomerId,
       o.OrderDate,
       o.Total,
       SUM(o.Total) OVER (
           PARTITION BY o.CustomerId
           ORDER BY o.OrderDate, o.OrderId
           ROWS UNBOUNDED PRECEDING
       ) AS ltv_at_this_point
FROM Sales.Orders AS o;

Each customer’s running LTV after each order, resetting per customer because of PARTITION BY. Useful for cohort analysis, retention curves, and any “at the time of order N, this customer’s total spend was X” question.

FIRST_VALUE and LAST_VALUE

Returns the first or last value in the frame. Commonly used to carry a per-partition attribute alongside the row:

-- Each order, plus the customer's first-ever order date
SELECT o.CustomerId,
       o.OrderId,
       o.OrderDate,
       FIRST_VALUE(o.OrderDate) OVER (
           PARTITION BY o.CustomerId
           ORDER BY o.OrderDate, o.OrderId
       ) AS first_order_date,
       DATEDIFF(DAY,
           FIRST_VALUE(o.OrderDate) OVER (
               PARTITION BY o.CustomerId
               ORDER BY o.OrderDate, o.OrderId
           ),
           o.OrderDate
       ) AS days_since_first_order
FROM Sales.Orders AS o;

Every row now knows how many days it was placed after the customer’s first order. Perfect for customer-lifecycle reports.

Gotcha with LAST_VALUE: the default frame for most window functions is “first row to current row,” so LAST_VALUE without an explicit frame returns the current row, not the last of the partition. Always use the explicit frame:

LAST_VALUE(OrderDate) OVER (
    PARTITION BY CustomerId
    ORDER BY OrderDate
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

De-duplication with ROW_NUMBER

One of the best applications of ROW_NUMBER: dropping duplicate rows while keeping a specific one.

Say Runehold’s customer import accidentally created multiple rows per email. Keep the earliest one:

WITH ranked AS (
    SELECT CustomerId, Email, Name, CreatedAt,
           ROW_NUMBER() OVER (
               PARTITION BY Email
               ORDER BY CreatedAt ASC, CustomerId ASC
           ) AS rn
    FROM Sales.Customer
)
DELETE FROM ranked WHERE rn > 1;

Astonishingly useful. Also a place where the deterministic tiebreaker matters: always include the primary key in the ORDER BY so the “keep this one” decision is reproducible.

Window functions vs GROUP BY

Common beginner confusion:

-- GROUP BY: collapses rows, one row out per country
SELECT CountryCode, SUM(Total) AS revenue
FROM Sales.Orders
GROUP BY CountryCode;

-- Window: keeps every row, each row shows its country's total
SELECT OrderId, CountryCode, Total,
       SUM(Total) OVER (PARTITION BY CountryCode) AS country_revenue
FROM Sales.Orders;

Use GROUP BY when you want the aggregated result as rows. Use windows when you want each row to know about its group’s aggregate.

Combining with CTEs for readability

Complex window logic gets unreadable fast. CTEs help:

WITH order_stats AS (
    SELECT o.OrderId,
           o.CustomerId,
           o.Total,
           o.OrderDate,
           ROW_NUMBER() OVER (PARTITION BY o.CustomerId
                              ORDER BY o.OrderDate, o.OrderId) AS order_seq,
           SUM(o.Total) OVER (PARTITION BY o.CustomerId
                              ORDER BY o.OrderDate, o.OrderId
                              ROWS UNBOUNDED PRECEDING) AS cumulative_ltv,
           LAG(o.OrderDate, 1) OVER (PARTITION BY o.CustomerId
                                      ORDER BY o.OrderDate, o.OrderId) AS prev_order_date
    FROM Sales.Orders AS o
)
SELECT CustomerId,
       OrderId,
       order_seq,
       OrderDate,
       cumulative_ltv,
       DATEDIFF(DAY, prev_order_date, OrderDate) AS days_since_prev_order
FROM order_stats
ORDER BY CustomerId, order_seq;

Each row: “This is Maria’s 3rd order, her running LTV is €245, and this order was placed 28 days after her previous one.” That’s a cohort-analysis report written in ~15 lines. Before window functions, the same report was 50+ lines of self-joins and people cried.

Run this on your own machine

USE Runehold;
GO

-- Rank customers by LTV within their country
WITH customer_ltv AS (
    SELECT c.CustomerId, c.Name, c.Country,
           COALESCE(SUM(o.Total), 0) AS ltv
    FROM Sales.Customer AS c
    LEFT JOIN Sales.Orders AS o ON o.CustomerId = c.CustomerId
    GROUP BY c.CustomerId, c.Name, c.Country
)
SELECT Name, Country, ltv,
       ROW_NUMBER() OVER (PARTITION BY Country ORDER BY ltv DESC) AS country_rank,
       NTILE(4)     OVER (ORDER BY ltv DESC)                    AS global_quartile
FROM customer_ltv
ORDER BY Country, country_rank;

-- Day-over-day revenue delta
WITH daily AS (
    SELECT CAST(OrderDate AS DATE) AS day, SUM(Total) AS revenue
    FROM Sales.Orders
    GROUP BY CAST(OrderDate AS DATE)
)
SELECT day,
       revenue,
       LAG(revenue) OVER (ORDER BY day) AS prev_day,
       revenue - LAG(revenue) OVER (ORDER BY day) AS delta_eur
FROM daily
ORDER BY day;

-- Running total per customer
SELECT o.CustomerId,
       o.OrderDate,
       o.Total,
       SUM(o.Total) OVER (
           PARTITION BY o.CustomerId
           ORDER BY o.OrderDate, o.OrderId
           ROWS UNBOUNDED PRECEDING
       ) AS ltv_at_order_time
FROM Sales.Orders AS o
ORDER BY o.CustomerId, o.OrderDate;

-- 3-order moving average for each customer
SELECT o.CustomerId,
       o.OrderDate,
       o.Total,
       AVG(o.Total) OVER (
           PARTITION BY o.CustomerId
           ORDER BY o.OrderDate, o.OrderId
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ) AS three_order_avg
FROM Sales.Orders AS o
ORDER BY o.CustomerId, o.OrderDate;

Run each, read each row, and look at how the window columns change. Once the shape is natural, half your future queries will be 30% shorter and 10× more expressive.

Next lesson: UNION vs UNION ALL. Short one. One trap. Then in lesson 14 we get to string functions — STRING_AGG, STRING_SPLIT, and the death of CLR workarounds.