Vivek Kumar

Posted on May 11

Beyond Basics: Mastering SQL's LEAD, LAG, RANK, DENSE_RANK, and NTILE

#analytics #database #postgres #sql

You've learned that window functions exist. You've run a ROW_NUMBER() or two. But there's a whole tier of power hiding just one step further — functions that let you compare rows against their neighbors, assign competition-style rankings, and slice datasets into meaningful percentile buckets. No self-joins. No correlated subqueries. No tears.

This article goes deep on five advanced window functions: LEAD, LAG, RANK, DENSE_RANK, and NTILE. Each one solves a specific class of problem you'll hit constantly in analytics, reporting, and data engineering. We'll use realistic scenarios — e-commerce orders, employee salaries, and product sales — so you can see exactly where these functions shine.

Prerequisite: This article assumes you already know what the OVER() clause and PARTITION BY are. If not, check out the intro to SQL window functions first.

The LEAD and LAG Functions: Looking Forward and Backward

LAG retrieves a value from a previous row. LEAD retrieves a value from a future row. Both are essential for time-series comparisons where you need to know "how did this value change from last period?"

Syntax:

LAG(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default_value) OVER (PARTITION BY ... ORDER BY ...)

offset — how many rows back (LAG) or forward (LEAD) to look. Defaults to 1.
default_value — what to return when there's no row at that offset (instead of NULL).

Example 1: Month-over-Month Revenue Change

You have a monthly_revenue table tracking each month's sales:

CREATE TABLE monthly_revenue (
  month       DATE,
  product_id  INT,
  revenue     NUMERIC(12,2)
);

To calculate the change from the previous month for each product:

SELECT
  month,
  product_id,
  revenue,
  LAG(revenue, 1, 0) OVER (
    PARTITION BY product_id
    ORDER BY month
  ) AS prev_month_revenue,
  revenue - LAG(revenue, 1, 0) OVER (
    PARTITION BY product_id
    ORDER BY month
  ) AS revenue_change
FROM monthly_revenue
ORDER BY product_id, month;

Result (sample rows):

month	product_id	revenue	prev_month_revenue	revenue_change
2026-01-01	101	12000.00	0.00	12000.00
2026-02-01	101	15400.00	12000.00	3400.00
2026-03-01	101	13800.00	15400.00	-1600.00

Notice the default_value of 0 prevents the first row from showing NULL. That's a small touch that makes downstream reporting much cleaner.

Example 2: Detecting Customer Churn Risk with LEAD

You want to flag customers who placed an order but then didn't order again within 90 days:

SELECT
  customer_id,
  order_date,
  LEAD(order_date) OVER (
    PARTITION BY customer_id
    ORDER BY order_date
  ) AS next_order_date,
  CASE
    WHEN LEAD(order_date) OVER (
           PARTITION BY customer_id ORDER BY order_date
         ) IS NULL
      OR LEAD(order_date) OVER (
           PARTITION BY customer_id ORDER BY order_date
         ) - order_date > 90
    THEN 'At Risk'
    ELSE 'Active'
  END AS churn_status
FROM orders;

This is the kind of query that would previously require a self-join on orders o1 JOIN orders o2 ON o1.customer_id = o2.customer_id AND o2.order_date > o1.order_date — messy, slow, and hard to read.

RANK vs DENSE_RANK vs ROW_NUMBER: Choosing the Right Ranking Function

These three functions all assign row numbers, but they handle ties very differently. Picking the wrong one is a silent bug that poisons your reports.

Here's the core difference on a single dataset:

SELECT
  employee_name,
  department,
  salary,
  ROW_NUMBER()   OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()         OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
  DENSE_RANK()   OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;

Result (Engineering department):

employee_name	salary	row_num	rnk	dense_rnk
Alice	120000	1	1	1
Bob	105000	2	2	2
Carol	105000	3	2	2
Dan	98000	4	4	3

ROW_NUMBER always gives unique numbers — ties are broken arbitrarily. Use this for pagination.
RANK skips numbers after ties — Bob and Carol are both rank 2, so Dan is rank 4 (not 3). Use this for competition-style rankings ("top 3 scores").
DENSE_RANK never skips — Bob and Carol are both rank 2, and Dan is rank 3. Use this when gaps in ranking would be confusing or misleading.

Example 3: Top-Earning Employee Per Department

A very common analytics requirement — "show me the top earner in each department":

WITH ranked_employees AS (
  SELECT
    employee_name,
    department,
    salary,
    DENSE_RANK() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS salary_rank
  FROM employees
)
SELECT employee_name, department, salary
FROM ranked_employees
WHERE salary_rank = 1;

Why DENSE_RANK instead of RANK here? Because if two people tie for top salary, you want both to appear. RANK = 1 and DENSE_RANK = 1 both return tied rows, but with DENSE_RANK you avoid the counterintuitive gap at rank 2.

NTILE: Slicing Data into Buckets

NTILE(n) divides ordered rows into n roughly equal groups and assigns each row a bucket number (1 through n). This is the window function for percentile analysis, quartiles, and cohort segmentation.

Syntax:

NTILE(n) OVER (PARTITION BY ... ORDER BY ...)

Example 4: Segmenting Customers by Purchase Value

Imagine you want to label customers as Bronze, Silver, Gold, or Platinum based on their total annual spend:

WITH customer_spend AS (
  SELECT
    customer_id,
    SUM(order_total) AS annual_spend
  FROM orders
  WHERE EXTRACT(YEAR FROM order_date) = 2025
  GROUP BY customer_id
),
customer_tiers AS (
  SELECT
    customer_id,
    annual_spend,
    NTILE(4) OVER (ORDER BY annual_spend ASC) AS spend_quartile
  FROM customer_spend
)
SELECT
  customer_id,
  annual_spend,
  CASE spend_quartile
    WHEN 1 THEN 'Bronze'
    WHEN 2 THEN 'Silver'
    WHEN 3 THEN 'Gold'
    WHEN 4 THEN 'Platinum'
  END AS tier
FROM customer_tiers
ORDER BY annual_spend DESC;

Result (sample):

customer_id	annual_spend	tier
C-4821	48320.00	Platinum
C-3019	31005.00	Gold
C-7733	12400.00	Silver
C-2210	1890.00	Bronze

One important caveat: if the total number of rows isn't evenly divisible by n, the higher-numbered buckets get one fewer row. This is mathematically correct but can surprise you if you expect perfectly even groups.

Example 5: Identifying Performance Outliers with NTILE

In a data pipeline, you want to flag the slowest 25% of queries for investigation:

SELECT
  query_id,
  execution_ms,
  NTILE(4) OVER (ORDER BY execution_ms DESC) AS slowness_quartile
FROM query_logs
WHERE logged_at >= NOW() - INTERVAL '7 days';

Filter WHERE slowness_quartile = 1 and you have your slowest 25% — no hardcoded thresholds, no percentile math in application code.

Common Mistakes and Gotchas

1. Missing ORDER BY inside OVER()

LEAD, LAG, RANK, and DENSE_RANK are meaningless without an ORDER BY inside OVER(). The database will either error out or return unpredictable results. Always specify it explicitly.

2. Confusing the outer ORDER BY with the window ORDER BY

The ORDER BY inside OVER(...) determines the window ordering — which row is "previous" or "next." The ORDER BY at the end of the query determines the display order. These are independent. Your results might look sorted even without an outer ORDER BY, but never rely on that.

3. RANK vs DENSE_RANK for filtering

Using WHERE rank = 2 after RANK() might return zero rows if rank 2 was skipped due to a tie. DENSE_RANK avoids this. When filtering by rank in a CTE, think carefully about which behavior you want.

4. NULLs in LAG/LEAD

The first row for each partition has no previous row, so LAG returns NULL by default. Always provide a sensible default value (the third argument) if downstream calculations can't handle NULL.

5. NTILE and uneven groups

NTILE(3) on 10 rows gives groups of sizes 4, 3, 3 — not 3.33 each. Document this behavior if you're presenting "even segments" to stakeholders.

Key Takeaways

All five of these functions share the same OVER() structure you already know — what changes is what they compute. In summary:

LAG / LEAD — access values from neighboring rows, ideal for time-series deltas and gap detection
RANK — competition-style ranking with gaps after ties, best for "top N" filtering
DENSE_RANK — ranking without gaps after ties, best when missing rank numbers would be confusing
ROW_NUMBER — always unique, best for pagination and deduplication
NTILE(n) — bucket rows into n groups, best for percentiles and cohort segmentation

The real superpower here is combining them. Use LAG to calculate MoM change, then NTILE to segment products into growth quartiles, then DENSE_RANK to rank within each quartile — all in a single query using CTEs.

What's Next?

Try rewriting a query you currently use that relies on a self-join for row comparison. Chances are LAG or LEAD collapses it into something half the length. Share what you built in the comments — I'd love to see real-world use cases you've solved with these functions!

DEV Community