Gowtham Potureddi

Posted on May 13

Senior SQL: Advanced Joins, Window Analytics, Plans, Indexing & Production Mindset

#python #sql #inetrview #dataengineering

Senior SQL is not a longer SELECT — it is scale-aware relational engineering: you can state grain, predict cardinality, read a planner, choose indexes and partitions, and reason about correctness under concurrency while keeping SQL maintainable for the next teammate. Hiring loops for senior data engineers, analytics engineers, and backend owners increasingly assume that PostgreSQL, SQL Server, Snowflake, BigQuery, or Redshift are all “just dialects” around the same invariants.

The shift from junior to senior is the shift from “make this dataset” to “how does this behave at **tens or hundreds of millions* of rows, under real isolation, with observable plans?”* Below the hero, the fastest lever is still keyboard time on joins, windows, and EXPLAIN-driven refactors:

Browse practice hub →, open SQL language practice →, sharpen joins →, deepen window functions →, and reinforce CTEs →.

On this page

Junior vs senior — the mindset and the bar
Join mastery — cardinality, order, and physical strategies
Window analytics — partitions, orders, and frames
Recursive CTEs — hierarchies and graph-shaped data
Plans, indexes, and partitions — observability meets physics
Isolation, transactions, and locking — correctness under concurrency
Modeling, ETL SQL, quality checks, and anti-patterns
Tips to stay senior under interview clocks
Frequently asked questions
Practice on PipeCode

1. Junior vs senior — the mindset and the bar

From syntax fluency to production responsibility

Invariant: Junior SQL answers “what row shape?” — senior SQL answers “what row shape, what cost, and what failure modes?”

Detailed explanation. A junior-ready script filters and aggregates correctly on sample data. Senior SQL implies you can defend index use, spot join fan-out, choose window frames deliberately, and articulate transaction trade-offs — the stack companies run on Snowflake, BigQuery, Redshift, Postgres, or SQL Server rewards that depth with stable night jobs and non-deadlocking noon dashboards.

Pro tip: When a principal asks “what would you check first?” for a slow query, answer with grain + predicates + join graph + plan diff + stats freshness before mentioning “add an index.”

What junior coverage usually stops at

Detailed explanation. Baseline competency is SELECT / WHERE / GROUP BY / ORDER BY, basic JOIN, and INSERT / UPDATE / DELETE hygiene. That is enough to be productive on small tables and tutorials — insufficient when one-to-many edges multiply rows explosively or when NULL semantics invalidate NOT IN patterns across production feeds.

Worked example.

signal	junior-heavy answer	senior-shaped answer
slow report	“add DISTINCT”	“measure join width; maybe semijoin or pre-aggregate”

What senior coverage adds

Detailed explanation. Seniors lean on advanced joins with explicit cardinality stories, window analytics with correct frames, recursive CTEs for org/dependency graphs, execution plans (EXPLAIN, EXPLAIN ANALYZE where available), index strategy (composite, covering, selective partials), partition pruning, isolation levels, locking/deadlock narratives, star/snowflake modeling literacy, staged CTE ETL readability, and data-quality probes your pipeline can run daily.

How seniors decompose a “suddenly slow” query

Detailed explanation. Production regressions are rarely random: a stats refresh, a code deploy that widens predicates, a fan-out join introduced in a refactor, or a warehouse reschedule that starves slots all show up as plan or wall-clock shifts. Seniors time-box triage into repro → grain → predicates → join graph → plan diff → data skew so each hypothesis is falsifiable in minutes, not days.

Worked example.

stakeholder question	junior-heavy reflex	senior-shaped triage
“Dashboard blew up”	guess one new index	compare yesterday vs today plan; check partition predicates; confirm foreign-key join did not become M:N

Observable signals worth naming in interviews

Detailed explanation. You do not need perfect telemetry to sound senior — you need explicit observables: buffer/cache hit patterns, spill to disk in sort/hash nodes, rows out vs rows in at each join, remote vs local bytes in warehouses, and whether late-arriving facts changed window cohort sizes. Pair those nouns with what you would change (predicate, index leading key, pre-aggregation, or isolation boundary) and you map execution reality to engineering action.

Common beginner traps

Myth: senior = more nested subqueries — often flatter CTEs + clearer grain beats clever tortured SQL.
Treating DISTINCT as deodorant — masks join explosions instead of fixing keys.
Ignoring dialect session settings — the same text runs different plans with different work_mem / warehouse slots.

2. Join mastery — cardinality, order, and physical strategies

Joins are algebra and physics

Invariant: every join multiplies or filters row sets predictably — seniors narrate one-to-one, one-to-many, and many-to-many edges before typing JOIN.

Detailed explanation. Textbook joins look symmetric; optimizers treat them as physical operators: nested loop (probe), hash (build + probe), merge (sorted streams). Cardinality estimates, predicate selectivity, index alignment, and memory budgets determine which operator wins. Interview credibility comes from linking schema diagram → join graph → expected operator family, not reciting definitions.

Cardinality consciousness

Detailed explanation. Before aggregating, ask: if I join customers to orders, how many rows per customer appear? If the business question is per customer but your join returns per order line, downstream SUM scans the wrong multiset. Seniors stabilize with pre-aggregation, EXISTS semijoins, or deduping keys before attaching wide fact tables.

Worked example.

relationship	join result width
1 customer : N orders	N rows per customer
accidental M:N bridge	explosion

Physical strategies (how interviewers phrase it)

Detailed explanation. Nested loop shines with tiny outer sides or selective index nested loops. Hash join often wins for large equi-joins without helpful sort orders. Merge join needs sorted inputs — cheap when indexes provide order, expensive when sorts spill. Saying when each appears beats naming them alone.

Semijoins, antijoins, and row multiplication

Detailed explanation. EXISTS and IN (semi-correlated) patterns answer membership without duplicating the right-hand side — when you only need “is there a matching order?” you should not inner join orders and then DISTINCT your way back to customer grain. NOT EXISTS expresses antijoin with sane NULL semantics where NOT IN over nullable columns becomes a footgun. Interviewers listen for that distinction because it separates “I can write joins” from “I can guard cardinality.”

Outer joins and predicates: where the filter lives

Detailed explanation. Predicates on the nullable side of a LEFT JOIN behave differently in the ON clause vs the WHERE clause: in WHERE, you often null out preserved rows and accidentally convert a left join into an inner join; in ON, you shape the match before preservation. Seniors say aloud which semantics the business question needs (include non-matching parents vs only parents with qualifying children), then place predicates deliberately.

Numeric fan-out: why “only ten accounts” still explodes

Detailed explanation. Three 1:N joins in a row multiply: 10 accounts × 200 orders × 5 line items is 10,000 fact-shaped rows before a single SUM. If the dashboard question is account grain, that join order without pre-aggregation is wrong, not just slow.

Worked example.

step	relationship	rows per surviving account (illustrative)
accounts → orders	1:N	200
orders → lines	1:N	×5 → 1,000
accidental tag bridge	N:M	×k → thousands+

Common beginner mistakes

SELECT projections that widen fact grain before GROUP BY — expensive and ambiguous.
Outer joins with predicates on the outer side in the wrong clause — accidentally turning them into inner joins or duplicating rows.
Assuming the optimizer will “figure it out” without verifying stats, histograms, or session limits.

3. Window analytics — partitions, orders, and frames

Keep row grain while computing comparative metrics

Invariant: GROUP BY collapses; OVER() decorates — seniors pick the right one before typing.

Detailed explanation. Ranking (ROW_NUMBER, RANK, DENSE_RANK), offsets (LAG/LEAD), running totals, and moving averages are standard in analytics pipelines. Senior mastery is PARTITION BY discipline (correct cohort boundaries), ORDER BY inside windows (ties handled deliberately), and frame clauses (ROWS vs RANGE vs GROUPS) that match business time semantics.

Ranking patterns

Detailed explanation. ROW_NUMBER breaks ties arbitrarily unless you add tie-break columns — great for dedup keep-one. RANK leaves gaps after ties; DENSE_RANK does not. Interview prompts often hide tie-break requirements; name them aloud.

Frames — where running metrics go wrong

Detailed explanation. Default frames differ by function; aggregates over ORDER BY windows often accumulate from partition start through current row, while LAG ignores frames. For 7-day moving averages you usually want an explicit ROWS BETWEEN 6 PRECEDING AND CURRENT ROW (or calendar-aware RANGE in warehouses that support it well).

Worked example — frame semantics in one line of business logic. Suppose events share a user_id partition and an event_ts order. A 7-row moving click count uses ROWS when “seven events” is the contract; use RANGE INTERVAL '7 day' PRECEDING when “seven calendar days of irregular events” is the contract — mixing these quietly changes cohort sizes and downstream KPIs.

`LAG` / `LEAD` and session boundaries

Detailed explanation. Offsets compare each row to its neighbors inside the same PARTITION BY. That is how seniors build sessions (“gap > 30 minutes starts a new session”), previous-value deltas, and trip completion flags without correlated subqueries. The footgun is NULL on the partition’s first row — decide whether IGNORE NULLS (where supported) or a COALESCE story matches the spec.

Worked-example solution.

-- Reveal gaps between consecutive events per user (sessionization primitive)
SELECT user_id,
       event_ts,
       LAG(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts) AS prev_ts,
       event_ts - LAG(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts) AS gap
FROM user_events;

Rule of thumb: if the problem statement says “compared to the previous row in some ordering,” reach for LAG/LEAD before self-joins — fewer duplicate sorts, clearer intent.

Common beginner mistakes

Forgetting PARTITION BY — “global” ranks across unrelated cohorts.
Over-wide SELECT projections inside CTEs feeding windows — unnecessary width tanks sort spill cost.
Using windows where GROUP BY already expresses the same collapse — doubles work.

SQL Interview Question on top three salaries per department

Tables: employees(id, name, department_id, salary) — ties possible. Prompt: Return at most three employees per department by salary descending, breaking ties by lower id first. Emit department_id, name, salary, rn.

Solution Using `ROW_NUMBER` with tie-break `ORDER BY`

SELECT department_id, name, salary, rn
FROM (
    SELECT department_id,
           name,
           salary,
           ROW_NUMBER() OVER (
               PARTITION BY department_id
               ORDER BY salary DESC, id ASC
           ) AS rn
    FROM employees
) x
WHERE rn <= 3;

Step-by-step trace

Input:

id	department_id	name	salary
1	10	Ada	90000
2	10	Bob	90000
3	10	Chi	85000
4	10	Dan	80000
5	20	Eve	70000

Partition by department_id — dept 10 and dept 20 rank independently.
ORDER BY salary DESC, id ASC — among salary ties, smaller id wins rn = 1 (Ada before Bob).
ROW_NUMBER assigns 1…4 within dept 10; outer filter keeps rn ≤ 3 → Ada, Bob, Chi.

Output:

department_id	name	salary	rn
10	Ada	90000	1
10	Bob	90000	2
10	Chi	85000	3
20	Eve	70000	1

Why this works — concept by concept:

Partition boundary — window resets per department_id, mirroring “top within group” specs.
Deterministic ordering — id tie-break prevents non-deterministic rank picks across engines.
ROW_NUMBER vs RANK — we need exactly N rows even with ties; RANK/DENSE_RANK can emit more than three logical ties depending on wording.
Predicate after window — compute rank once, filter rn; avoids correlated subquery patterns.
Cost — window sort is O(n log n) per partition in typical implementations; indexes on (department_id, salary DESC, id) help warehouse engines avoid full resorts when data is clustered.

SQL
Topic — window functions
Window SQL drills

Practice →

SQL
Topic — joins
Join-heavy SQL drills

Practice →

4. Recursive CTEs — hierarchies and graph-shaped data

Trees, org charts, and bill-of-materials patterns

Invariant: recursive CTEs walk a graph defined by a base case + inductive join — seniors prove cycle avoidance or accept termination rules.

Detailed explanation. Classic pattern: seed roots (manager_id IS NULL), iteratively attach children by joining the working set to the base table. BOM explosions and dependency queues reuse the same skeleton. Interviews probe depth limits, cycle detection, and whether you should push heavy graph work to graph engines instead of SQL when edges explode.

Worked example — verbal shape.

leg	action
anchor	pick root rows
recursive	join children to frontier
guard	optional `WHERE depth < 50`

Concrete org-chart skeleton (ANSI shape)

Detailed explanation. Most dialects compile anchor UNION ALL recursive member into iterative operators; you should still think in rounds: each recursive leg extends the frontier one hop. Keep the recursive member join purely structural (parent id to child manager_id) and push business filters either into the anchor or into a final WHERE so you do not accidentally starve valid branches.

Worked-example solution.

-- Depth-limited reporting tree: dialect-specific RECURSIVE keyword may be required
WITH RECURSIVE subordinates AS (
    SELECT id,
           manager_id,
           name,
           1 AS depth
    FROM employees
    WHERE manager_id IS NULL           -- anchor: executives; swap for :boss_id in interviews
    UNION ALL
    SELECT e.id,
           e.manager_id,
           e.name,
           s.depth + 1
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.id
    WHERE s.depth < 20                  -- guardrail against runaway depth
)
SELECT * FROM subordinates;

Cycles, uniqueness, and when SQL is the wrong tool

Detailed explanation. Undirected or cyclic graphs need visit tracking: maintain a path string, array of ids, or a visited bitmap column in the recursive leg and abort when you would revisit a node. Without that, a single back-edge can recurse until the engine stops you. Even with guards, very deep hierarchies on hot OLTP paths may be the wrong layer — materialized paths, closure tables, or graph services exist because repeated recursion is CPU- and lock-heavy.

Bill-of-materials and explosion factors

Detailed explanation. BOM joins are recursive in business language: each part may decompose into sub-parts with quantities. Seniors track multiplicative quantities through levels (parent qty × child qty) and watch for diamond structures where the same sub-assembly appears twice — dedupe keys or DAG modeling prevents double-counting rollups.

Common beginner mistakes

Missing uniqueness — duplicate edges cause exponential blowups.
No cycle guard on adjacency with back-edges — recursion runs away.
Deep graphs in OLTP hot paths — offload or materialize paths offline.

5. Plans, indexes, and partitions — observability meets physics

`EXPLAIN` is the senior’s debugger

Invariant: plans make I/O and CPU explicit — seniors diff estimated vs actual rows and watch for seq scans, spills, bad nested loops, and stale stats.

Detailed explanation. A junior sees “slow query.” A senior checks selectivity, projection width, join order, and whether indexes match predicate leading columns. On warehouses, translate the same instinct to partition pruning, cluster keys, and slot contention — different nouns, same skepticism about full reads.

Index strategy (B-tree mental model)

Detailed explanation. Composite indexes follow left-prefix use: (customer_id, order_date) helps WHERE customer_id = ? and range order_date within that customer — not arbitrary order_date alone. Covering indexes include projections to avoid heap lookups when MVCC engines make that worthwhile. Always weigh write amplification on hot ingestion tables.

Worked example.

DDL	intent
`CREATE INDEX ON orders(customer_id, order_date)`	seek customer timeline

Partitioning — prune, don’t pray

Detailed explanation. Range partitions on order_date let engines skip cold files or table segments. Seniors write predicates that align to partition keys (half-open ranges help). Anti-pattern: function-wrapped partition columns that hide prune (WHERE YEAR(dt)=2025 instead of range on dt).

Worked example.

predicate on `event_date`	partition prune?
`event_date >= DATE '2025-04-01' AND event_date < DATE '2025-05-01'`	yes — engine can eliminate irrelevant segments
`EXTRACT(YEAR FROM event_date) = 2025`	often no — function masks the column
join on surrogate only, filter on dimension date later	risky — fact scans may widen before filter

Reading a plan like a diff

Detailed explanation. Treat EXPLAIN (ANALYZE, BUFFERS) (Postgres) or vendor equivalents as a before/after diff: did estimated rows diverge from actual by 10× (hinting stale stats or correlated predicates)? Did a hash join spill? Did a nested loop suddenly execute billions of inner probes? Those questions map to histogram refresh, predicate rewrite, index leading key, or join order hints — pick one lever per iteration.

Selective partial indexes and write amplification

Detailed explanation. Partial indexes (WHERE status = 'OPEN') shrink index size on skewed status columns and speed hot paths that always filter the same slice — at the cost of planner surprises if ORMs omit the same predicate. Covering indexes add include-columns to satisfy SELECT lists in index-only scans but increase VACUUM/maintenance surface area on write-heavy tables.

Common beginner mistakes

Indexing every column — harms writes; weak selectivity on indexed columns hurts planner choices.
Blaming the planner before checking vacuum/analyze, AUTO STATS, or histogram freshness.
Micro-benchmarking on empty tables — plans change radically at scale.

6. Isolation, transactions, and locking — correctness under concurrency

Isolation is a contract, not a vibe

Invariant: isolation levels trade anomalies for throughput — seniors pick with eyes open.

Detailed explanation. Know the textbook quadrilogy — dirty reads, non-repeatable reads, phantoms, serialization anomalies — and which levels suppress which on your engine defaults (read committed vs repeatable read vs serializable / snapshot). Locks (row, predicate, deadlocks) are how databases enforce those stories under write contention.

Locking & deadlocks

Detailed explanation. Deadlocks arise from opposite lock order on two resources — mitigation is consistent lock acquisition order, smaller transactions, and retries on 40001-class errors where supported. Seniors capture deadlock graphs instead of guessing.

Isolation levels vs anomalies (memory aid)

Detailed explanation. Different engines implement snapshot, MVCC, and predicate locks differently, but interviewers still expect you to name anomalies and which level tolerates them.

Worked example.

isolation level (typical names)	dirty read	non-repeatable read	phantom read
Read uncommitted	allowed	allowed	allowed
Read committed	blocked	possible	possible
Repeatable read / snapshot	blocked	blocked	engine-dependent
Serializable	blocked	blocked	blocked (often at throughput cost)

A pragmatic concurrency playbook

Detailed explanation. Default to short transactions, ordered lock acquisition on shared resources, SELECT … FOR UPDATE only when you mean it, and idempotent retry logic for serialization failures. For analytics, read-only replicas or warehouse sessions isolate heavy scans from OLTP lock pressure — another form of isolation, just at the architecture layer.

Common beginner mistakes

Long transactions holding locks while calling HTTP services — stalls the whole store.
Implicit READ UNCOMMITTED “for speed” — surprises downstream with phantoms.
Assuming ORMs manage boundary lines — you still own batch boundaries.

7. Modeling, ETL SQL, quality checks, and anti-patterns

Readable pipelines and honest schemas

Invariant: modeling decides which SQL is even possible — stars/snowflakes, surrogate keys, SCD strategies, and facts at grain.

Detailed explanation. Seniors design fact tables at immutable event grain and dimensions for attributes that change slowly. CTEs stage raw → cleaned → conformed → aggregated layers so diffs read like dataflow, not wall-of-text SQL. DQ checks (GROUP BY duplicate detectors, NULL rate scans) belong beside transforms, not after CFO escalations.

ETL SQL readability

Detailed explanation. Prefer WITH chains with named intents (cleaned_events, daily_revenue) over nested opaque subqueries. Warehouse runners still care — maintainers git blame your CTE names at 2 AM.

Keys, grain, and slowly changing dimensions

Detailed explanation. Natural keys (email, SKU) feel convenient until merges, typos, or vendor changes arrive — surrogate keys stabilize joins but require disciplined ETL to preserve history. SCD Type 1 overwrites attributes (easy, history lost); Type 2 versions rows with valid_from / valid_to (truthful, joins heavier); Type 3 keeps limited prior columns (rare, simplified). Seniors pick per attribute: addresses often Type 2, corrected typos sometimes Type 1 with audit trails elsewhere.

Worked example.

SCD flavor	when seniors choose it	SQL consequence
Type 1	truth today only	simple dimension join
Type 2	legal/finance needs history	join on as-of or current flag
Type 3	“last previous region” reporting	extra columns, lighter than full Type 2

DQ probes beside transforms, not after escalations

Detailed explanation. Lightweight checks catch contract breaks early: GROUP BY natural key HAVING COUNT(*) > 1 finds duplicates; NULL rate SUM(CASE WHEN col IS NULL THEN 1 END) / COUNT(*) on critical columns flags ingestion drift; referential probes (LEFT JOIN dimension WHERE fact key not matched) catch orphan facts before CFO reviews.

Anti-patterns seniors refuse

Detailed explanation. SELECT-star in hot paths widens IO. Functions on indexed columns (LOWER(email)) often NULL-unsafe and can suppress index use — prefer computed / persisted columns or case-folded canonical fields. Correlated subqueries can be fine — or catastrophic — verify plans.

Materialized views / rollups

Detailed explanation. Materialized views (where supported) precompute heavy aggregates for stable dashboards — trade staleness for latency. Document refresh semantics; seniors don’t hide hourly lag behind a “live” button label.

Common beginner mistakes

Denormalizing “because warehouses love joins” without SCD strategy — creates retroactive lies.
DQ as an afterthought — duplicates discovered monthly should be detected daily.

Tips to stay senior under interview clocks

Start every join question with cardinality — who is 1, who is N, what is the output grain?
Default to half-open time windows for reporting — fewer off-by-one month bugs.
Say “I’d diff the plan” — then list stats, indexes, data skew, predicate bake-in.
Know when SQL stops — deep cyclic graphs may belong in graph tools, not 90-line CTE battle.
Name frame units aloud — ROWS (fixed neighbors) vs RANGE (business time) prevents silent KPI drift.
Where to practice on PipeCode — chain window functions →, joins →, CTEs →, and aggregation → until window + join stories feel automatic.

Frequently asked questions

What is “senior SQL” in hiring terms?

Senior SQL means you can ship correct, efficient, maintainable relational workloads — reading plans, designing indexes/partitions, mastering windows and recursive patterns, and debugging concurrency issues — not only writing syntactically valid queries on toy tables. Interviewers listen for explicit cardinality stories, failure modes (locks, skew, bad stats), and refactor discipline: can you improve a query without hiding problems behind DISTINCT.

How is senior SQL different from knowing a specific warehouse?

Dialects differ (BigQuery vs Snowflake vs Postgres), but grain, join cardinality, frames, pruning, and isolation transfer. Seniors learn local plan vocabulary fast because the invariants repeat. The differentiator is not memorizing QUALIFY or CLUSTER BY alone — it is mapping each feature back to less bytes read, fewer shuffles, or clearer semantics.

When should I prefer `RANK` over `ROW_NUMBER`?

Use RANK/DENSE_RANK when tie groups must share standing (e.g., “top quartile bands”). Use ROW_NUMBER when you need deterministic dedup or exactly N rows with explicit tie-break columns. If the prompt says “top three salaries” but ties may exceed three people, RANK can overshoot row count — say that aloud and clarify requirements before coding.

Do I always need an index for fast queries?

No — tiny tables or analytical scans may be cheaper sequential; write-heavy tables pay index maintenance. Seniors choose based on selectivity, predicate shape, and observed plans — not folklore. Sometimes the winning move is narrower projections, pre-aggregation, or better stats rather than a new B-tree.

What is the biggest modeling mistake in analytics SQL?

Accidental grain shift — joining dimensions or facts so one event becomes many rows, then aggregating as if grain were still one row per event. Fix the join graph, not the DISTINCT. The durable fix is usually staging at the correct grain (e.g., per user-day) before attaching wide dimensions.

How do I practice senior patterns safely?

Work on larger realistic slices — partitioned time series, skewed keys, and multi-step CTE pipelines — and always inspect plans after rewrites. Supplement reading with timed reps on SQL topics → you're weakest at. Rotate filtering → and join → sets when questions hide fan-out in plain English.

Practice on PipeCode

PipeCode ships 450+ interview-grade problems spanning joins, aggregation, window analytics, CTEs, and filtering in SQL. Start from Explore practice →, narrow to language SQL →, and drill harder sets on SQL topic hub →. Unlock plans → when you want unrestricted runs.

1. Junior vs senior — the mindset and the bar

From syntax fluency to production responsibility

What junior coverage usually stops at

What senior coverage adds

How seniors decompose a “suddenly slow” query

Observable signals worth naming in interviews

Common beginner traps

2. Join mastery — cardinality, order, and physical strategies

Joins are algebra and physics

Cardinality consciousness

Physical strategies (how interviewers phrase it)

Semijoins, antijoins, and row multiplication

Outer joins and predicates: where the filter lives

Numeric fan-out: why “only ten accounts” still explodes

Common beginner mistakes

3. Window analytics — partitions, orders, and frames

Keep row grain while computing comparative metrics

Ranking patterns

Frames — where running metrics go wrong

LAG / LEAD and session boundaries

Common beginner mistakes

SQL Interview Question on top three salaries per department

Solution Using ROW_NUMBER with tie-break ORDER BY

4. Recursive CTEs — hierarchies and graph-shaped data

Trees, org charts, and bill-of-materials patterns

Concrete org-chart skeleton (ANSI shape)

Cycles, uniqueness, and when SQL is the wrong tool

Bill-of-materials and explosion factors

Common beginner mistakes

5. Plans, indexes, and partitions — observability meets physics

EXPLAIN is the senior’s debugger

Index strategy (B-tree mental model)

Partitioning — prune, don’t pray

Reading a plan like a diff

Selective partial indexes and write amplification

Common beginner mistakes

6. Isolation, transactions, and locking — correctness under concurrency

Isolation is a contract, not a vibe

Locking & deadlocks

Isolation levels vs anomalies (memory aid)

A pragmatic concurrency playbook

Common beginner mistakes

7. Modeling, ETL SQL, quality checks, and anti-patterns

Readable pipelines and honest schemas

ETL SQL readability

Keys, grain, and slowly changing dimensions

DQ probes beside transforms, not after escalations

Anti-patterns seniors refuse

Materialized views / rollups

Common beginner mistakes

Tips to stay senior under interview clocks

Frequently asked questions

What is “senior SQL” in hiring terms?

How is senior SQL different from knowing a specific warehouse?

When should I prefer RANK over ROW_NUMBER?

Do I always need an index for fast queries?

What is the biggest modeling mistake in analytics SQL?

How do I practice senior patterns safely?

Practice on PipeCode

`LAG` / `LEAD` and session boundaries

Solution Using `ROW_NUMBER` with tie-break `ORDER BY`

`EXPLAIN` is the senior’s debugger

When should I prefer `RANK` over `ROW_NUMBER`?