You've learned that window functions exist. You've run a ROW_NUMBER() or two. But there's a whole tier of power hiding just one step further — functions that let you compare rows against their neighbors, assign competition-style rankings, and slice datasets into meaningful percentile buckets. No self-joins. No correlated subqueries. No tears.
This article goes deep on five advanced window functions: LEAD, LAG, RANK, DENSE_RANK, and NTILE. Each one solves a specific class of problem you'll hit constantly in analytics, reporting, and data engineering. We'll use realistic scenarios — e-commerce orders, employee salaries, and product sales — so you can see exactly where these functions shine.
Prerequisite: This article assumes you already know what the
OVER()clause andPARTITION BYare. If not, check out the intro to SQL window functions first.
The LEAD and LAG Functions: Looking Forward and Backward
LAG retrieves a value from a previous row. LEAD retrieves a value from a future row. Both are essential for time-series comparisons where you need to know "how did this value change from last period?"
Syntax:
LAG(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
-
offset— how many rows back (LAG) or forward (LEAD) to look. Defaults to 1. -
default_value— what to return when there's no row at that offset (instead of NULL).
Example 1: Month-over-Month Revenue Change
You have a monthly_revenue table tracking each month's sales:
CREATE TABLE monthly_revenue (
month DATE,
product_id INT,
revenue NUMERIC(12,2)
);
To calculate the change from the previous month for each product:
SELECT
month,
product_id,
revenue,
LAG(revenue, 1, 0) OVER (
PARTITION BY product_id
ORDER BY month
) AS prev_month_revenue,
revenue - LAG(revenue, 1, 0) OVER (
PARTITION BY product_id
ORDER BY month
) AS revenue_change
FROM monthly_revenue
ORDER BY product_id, month;
Result (sample rows):
| month | product_id | revenue | prev_month_revenue | revenue_change |
|---|---|---|---|---|
| 2026-01-01 | 101 | 12000.00 | 0.00 | 12000.00 |
| 2026-02-01 | 101 | 15400.00 | 12000.00 | 3400.00 |
| 2026-03-01 | 101 | 13800.00 | 15400.00 | -1600.00 |
Notice the default_value of 0 prevents the first row from showing NULL. That's a small touch that makes downstream reporting much cleaner.
Example 2: Detecting Customer Churn Risk with LEAD
You want to flag customers who placed an order but then didn't order again within 90 days:
SELECT
customer_id,
order_date,
LEAD(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS next_order_date,
CASE
WHEN LEAD(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) IS NULL
OR LEAD(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) - order_date > 90
THEN 'At Risk'
ELSE 'Active'
END AS churn_status
FROM orders;
This is the kind of query that would previously require a self-join on orders o1 JOIN orders o2 ON o1.customer_id = o2.customer_id AND o2.order_date > o1.order_date — messy, slow, and hard to read.
RANK vs DENSE_RANK vs ROW_NUMBER: Choosing the Right Ranking Function
These three functions all assign row numbers, but they handle ties very differently. Picking the wrong one is a silent bug that poisons your reports.
Here's the core difference on a single dataset:
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;
Result (Engineering department):
| employee_name | salary | row_num | rnk | dense_rnk |
|---|---|---|---|---|
| Alice | 120000 | 1 | 1 | 1 |
| Bob | 105000 | 2 | 2 | 2 |
| Carol | 105000 | 3 | 2 | 2 |
| Dan | 98000 | 4 | 4 | 3 |
-
ROW_NUMBERalways gives unique numbers — ties are broken arbitrarily. Use this for pagination. -
RANKskips numbers after ties — Bob and Carol are both rank 2, so Dan is rank 4 (not 3). Use this for competition-style rankings ("top 3 scores"). -
DENSE_RANKnever skips — Bob and Carol are both rank 2, and Dan is rank 3. Use this when gaps in ranking would be confusing or misleading.
Example 3: Top-Earning Employee Per Department
A very common analytics requirement — "show me the top earner in each department":
WITH ranked_employees AS (
SELECT
employee_name,
department,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
)
SELECT employee_name, department, salary
FROM ranked_employees
WHERE salary_rank = 1;
Why DENSE_RANK instead of RANK here? Because if two people tie for top salary, you want both to appear. RANK = 1 and DENSE_RANK = 1 both return tied rows, but with DENSE_RANK you avoid the counterintuitive gap at rank 2.
NTILE: Slicing Data into Buckets
NTILE(n) divides ordered rows into n roughly equal groups and assigns each row a bucket number (1 through n). This is the window function for percentile analysis, quartiles, and cohort segmentation.
Syntax:
NTILE(n) OVER (PARTITION BY ... ORDER BY ...)
Example 4: Segmenting Customers by Purchase Value
Imagine you want to label customers as Bronze, Silver, Gold, or Platinum based on their total annual spend:
WITH customer_spend AS (
SELECT
customer_id,
SUM(order_total) AS annual_spend
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2025
GROUP BY customer_id
),
customer_tiers AS (
SELECT
customer_id,
annual_spend,
NTILE(4) OVER (ORDER BY annual_spend ASC) AS spend_quartile
FROM customer_spend
)
SELECT
customer_id,
annual_spend,
CASE spend_quartile
WHEN 1 THEN 'Bronze'
WHEN 2 THEN 'Silver'
WHEN 3 THEN 'Gold'
WHEN 4 THEN 'Platinum'
END AS tier
FROM customer_tiers
ORDER BY annual_spend DESC;
Result (sample):
| customer_id | annual_spend | tier |
|---|---|---|
| C-4821 | 48320.00 | Platinum |
| C-3019 | 31005.00 | Gold |
| C-7733 | 12400.00 | Silver |
| C-2210 | 1890.00 | Bronze |
One important caveat: if the total number of rows isn't evenly divisible by n, the higher-numbered buckets get one fewer row. This is mathematically correct but can surprise you if you expect perfectly even groups.
Example 5: Identifying Performance Outliers with NTILE
In a data pipeline, you want to flag the slowest 25% of queries for investigation:
SELECT
query_id,
execution_ms,
NTILE(4) OVER (ORDER BY execution_ms DESC) AS slowness_quartile
FROM query_logs
WHERE logged_at >= NOW() - INTERVAL '7 days';
Filter WHERE slowness_quartile = 1 and you have your slowest 25% — no hardcoded thresholds, no percentile math in application code.
Common Mistakes and Gotchas
1. Missing ORDER BY inside OVER()
LEAD, LAG, RANK, and DENSE_RANK are meaningless without an ORDER BY inside OVER(). The database will either error out or return unpredictable results. Always specify it explicitly.
2. Confusing the outer ORDER BY with the window ORDER BY
The ORDER BY inside OVER(...) determines the window ordering — which row is "previous" or "next." The ORDER BY at the end of the query determines the display order. These are independent. Your results might look sorted even without an outer ORDER BY, but never rely on that.
3. RANK vs DENSE_RANK for filtering
Using WHERE rank = 2 after RANK() might return zero rows if rank 2 was skipped due to a tie. DENSE_RANK avoids this. When filtering by rank in a CTE, think carefully about which behavior you want.
4. NULLs in LAG/LEAD
The first row for each partition has no previous row, so LAG returns NULL by default. Always provide a sensible default value (the third argument) if downstream calculations can't handle NULL.
5. NTILE and uneven groups
NTILE(3) on 10 rows gives groups of sizes 4, 3, 3 — not 3.33 each. Document this behavior if you're presenting "even segments" to stakeholders.
Key Takeaways
All five of these functions share the same OVER() structure you already know — what changes is what they compute. In summary:
- LAG / LEAD — access values from neighboring rows, ideal for time-series deltas and gap detection
- RANK — competition-style ranking with gaps after ties, best for "top N" filtering
- DENSE_RANK — ranking without gaps after ties, best when missing rank numbers would be confusing
- ROW_NUMBER — always unique, best for pagination and deduplication
- NTILE(n) — bucket rows into n groups, best for percentiles and cohort segmentation
The real superpower here is combining them. Use LAG to calculate MoM change, then NTILE to segment products into growth quartiles, then DENSE_RANK to rank within each quartile — all in a single query using CTEs.
What's Next?
Try rewriting a query you currently use that relies on a self-join for row comparison. Chances are LAG or LEAD collapses it into something half the length. Share what you built in the comments — I'd love to see real-world use cases you've solved with these functions!
Top comments (0)