SQL Window Functions Reference

Free reference guide: SQL Window Functions Reference

15 results

About SQL Window Functions Reference

The SQL Window Functions Reference is a searchable quick-reference covering all major SQL window functions used for advanced analytics. It includes ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE), aggregate window functions (SUM OVER, AVG OVER, COUNT OVER, MAX/MIN OVER), value access functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE), and distribution functions (PERCENT_RANK, CUME_DIST). Each entry provides practical SQL examples with OVER clauses, PARTITION BY grouping, and ORDER BY sorting.

Window functions operate on a set of rows related to the current row without collapsing the result set like GROUP BY does. This makes them essential for calculating running totals, moving averages, ranking within groups, comparing values with previous or next rows, and computing percentile distributions. The reference demonstrates real-world patterns such as 7-day moving averages on stock prices, cumulative sales totals, department salary rankings, and row-to-row price comparisons.

Understanding the difference between frame clauses like ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW versus ROWS BETWEEN 6 PRECEDING AND CURRENT ROW is critical for correct window function behavior. This reference includes examples that illustrate how frame specifications affect LAST_VALUE (which requires UNBOUNDED FOLLOWING to return the actual last value) and how moving window frames create rolling aggregations. All window functions covered work across PostgreSQL, MySQL 8+, SQL Server, Oracle, and SQLite 3.25+.

Key Features

Ranking functions: ROW_NUMBER for sequential numbering, RANK with gaps, DENSE_RANK without gaps, NTILE for equal-sized groups
Aggregate window functions: running totals with SUM OVER, 7-day moving averages with AVG OVER, partition counts with COUNT OVER
Value access functions: LAG for previous row, LEAD for next row, FIRST_VALUE and LAST_VALUE for window extremes
NTH_VALUE for accessing any specific position within a window partition
Distribution functions: PERCENT_RANK for relative ranking (0-1) and CUME_DIST for cumulative distribution
PARTITION BY examples for grouping calculations by department, category, or any column
Frame clause demonstrations with ROWS BETWEEN, UNBOUNDED PRECEDING, and CURRENT ROW
Cross-database compatible syntax that works on PostgreSQL, MySQL 8+, SQL Server, and Oracle

Frequently Asked Questions

What is the difference between RANK and DENSE_RANK?

Both assign the same rank to rows with equal values, but they differ in how they handle the next rank. RANK skips numbers after ties: if two rows tie at rank 2, the next rank is 4. DENSE_RANK does not skip: the next rank after a tie at 2 is 3. Use RANK when you need the position reflecting the number of rows above (useful for percentile calculations), and DENSE_RANK when you need consecutive ranking numbers.

How does ROW_NUMBER handle ties compared to RANK?

ROW_NUMBER always assigns unique sequential numbers (1, 2, 3, 4...) even when values are identical. The order of tied rows is non-deterministic unless you add a tiebreaker column to ORDER BY. RANK assigns the same number to tied rows. Use ROW_NUMBER when you need exactly one row per position (e.g., deduplication with a CTE), and RANK when equal values should share the same position.

How do I calculate a running total with SUM OVER?

Use SUM(amount) OVER (ORDER BY date) to compute a cumulative sum ordered by date. The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which sums from the first row to the current row. To partition by category, add PARTITION BY: SUM(amount) OVER (PARTITION BY category ORDER BY date). This resets the running total for each category.

What is the difference between LAG and LEAD?

LAG(column, n) accesses the value from n rows before the current row (default n=1). LEAD(column, n) accesses the value from n rows after. Both accept a third argument for the default value when the offset goes beyond the window boundary. Common use cases include calculating day-over-day price changes (price - LAG(price, 1) OVER (ORDER BY date)) and comparing current metrics with the next period.

Why does LAST_VALUE often return the current row value?

By default, the window frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row (the last row in the default frame). To get the actual last value in the partition, you must specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This is a common pitfall. FIRST_VALUE does not have this issue because the first row is always included in the default frame.

How does NTILE divide rows into groups?

NTILE(n) distributes rows into n approximately equal groups and assigns a group number from 1 to n. If rows do not divide evenly, the earlier groups get one extra row. For example, NTILE(4) on 10 rows creates groups of 3, 3, 2, 2 rows. This is useful for creating quartiles, deciles, or any percentile-based grouping for statistical analysis.

What is the ROWS BETWEEN frame clause and how does it work?

The frame clause defines which rows within the partition are included in the window calculation. UNBOUNDED PRECEDING means the first row of the partition. CURRENT ROW is the current row. n PRECEDING/FOLLOWING means n rows before/after. For example, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a 7-row sliding window for moving averages. RANGE BETWEEN is similar but groups rows with equal ORDER BY values together.

What is the difference between PERCENT_RANK and CUME_DIST?

PERCENT_RANK calculates (rank - 1) / (total_rows - 1), giving 0 for the first row and 1 for the last. CUME_DIST calculates the fraction of rows with values less than or equal to the current row: count(values <= current) / total_rows. CUME_DIST is always > 0 and the last row is always 1. Use PERCENT_RANK for relative positioning and CUME_DIST for the proportion of the dataset at or below a value.