CTE in SQL — a Common Table Expression introduced with WITH — is how you turn a brittle wall of nested subqueries into a readable, debuggable pipeline. In data engineering interviews — the same lanes as basic sql interview questions, joins in sql interview questions, and sql interview questions with answers rounds — reviewers use CTEs as a readability signal: if you can name intermediate results and chain them cleanly, you will survive the live whiteboard refactor.
You will build the pattern from scratch: single-CTE anatomy, chained CTE pipelines, CTE + sql window functions for rank-then-filter (top‑N per group), WITH RECURSIVE org-chart and counting sequences, and the classic board question triad (CTE vs subquery vs temporary table) with blunt trade-offs. Every interview-style beat ends as sql interview questions with answers: runnable Postgres-flavoured SQL, a traced execution, printed output tables, and a concept-by-concept why this works map — without repeating the boilerplate headline as keyword stuffing.
When you want hands-on reps immediately after reading, browse CTE (SQL) practice →, dive practice SQL hub →, sharpen window function SQL →, rehearse join interview SQL →, or widen coverage on the general CTE topic →.
On this page
- Why CTEs matter in interviews and pipelines
- Single CTE — anatomy of
WITH … AS - Multiple chained CTEs — read top-down like dbt staging
- CTE with joins and aggregations
- CTE + sql window functions — rank, then filter
- Recursive CTE — hierarchies without leaving SQL
- CTE vs subquery vs temp table — interview trade-offs
- Choosing CTE usage (checklist)
- Frequently asked questions
- Practice on PipeCode
1. Why CTEs matter in interviews and pipelines
The readability invariant interviewers optimise for
The core invariant you should state aloud: a CTE binds a disposable, named relational expression to one outer SELECT, INSERT, UPDATE, or DELETE; it disappears when the enclosing statement completes—no DDL, no session lease—and it reads cleaner than tortured nested parentheses. That wording alone answers half of the "what is CTE" prompts you will hear in screening loops.
Detailed explanation.
-
Bound to one statement — every
WITHchain is part of a single top-level statement. Nothing "lives" afterCOMMIT/ROLLBACKof that unit the way a temp table does; you cannotSELECTa CTE alias from a different query in the same session without copy-pasting the definition. - Algebraic, not physical — the engine may inline a CTE into the outer query, merge predicates, or hoist joins. You name relations for humans and maintainers; the planner still rewrites. Senior answers separate authoring ergonomics from execution guarantees (see §7 for materialization nuance).
-
Same relational type as a subquery — a CTE produces a relation (bag or set depending on
UNIONvsUNION ALL); every row has a schema fixed by the innerSELECTlist. That is why you can stack CTEs like typed pipeline stages.
When an interviewer asks why you reached for WITH instead of a derived table, cite three forces:
-
Nameability —
avg_salaryreads better than "subselect#3" and signals intent in code review. - Refactor‑friendliness — comment out a CTE while debugging without rewiring parentheses; reorder stages when the story changes.
- Composition — later CTEs may reference earlier ones in the same chain; that mirrors dbt / SQL transformation layers without leaving the warehouse editor.
Pro tip: When someone says "walk me through this query," start at the last CTE in the chain and narrate backward to data sources—mirrors how many optimisers inline, and shows you control the algebraic story end-to-end.
Common beginner mistakes
- Treating CTE results as persisted tables—they are logical scopes, not caches (mention materialized CTE hints only when you genuinely know your engine exposes them).
- Inlining seventeen anonymous subqueries to "save lines" — interviewers downgrade readability instantly.
- Hiding mutating statements inside so-called readability layers — keep each CTE a pure relational expression unless the prompt explicitly mixes DML patterns.
- Reusing a CTE name as if it were a view registered in the catalog—only the enclosing statement can see it; cross-statement reuse needs a real view, temp table, or ORM layer.
2. Single CTE — anatomy of WITH … AS
Name the intermediate relation, then consume it once
Every non-recursive CTE has the same silhouette:
WITH alias AS (
SELECT ...
)
SELECT ... FROM alias;
Warehouses disagree on microscopic optimisation trivia; they agree on this grammar.
WITH binds scope, not storage
Think of WITH as lexical, not physical: engines may inline the CTE, merge filters, reorder joins—the author job is expressing intent cleanly so downstream planners can.
Detailed explanation.
-
Column list optional — you may write
WITH cte (a, b) AS (SELECT …)to rename projected columns explicitly; handy when outer queries should not depend on brittle inner aliases. -
Multiple references — the outer statement may reference the same CTE twice (
FROM cte c1 JOIN cte c2 …). That is legal but can surprise optimizers into scanning twice unless merged; mention that aloud if the interviewer asks about cost. -
Mutual exclusion with some clauses — dialect details vary, but in interviews assume the CTE's inner
SELECTfollows normal rules (no bare aggregates withoutGROUP BY, no illegal forward references to outer query columns—correlated subqueries still need explicit correlation).
Worked example. Filter highly paid rows before projecting columns:
| Concept | Meaning |
|---|---|
| Alias | reusable handle big_earn inside the outer query |
| Scope | disappears after outer SELECT finishes |
| Contrast vs inline | same relational idea, sharper whiteboard narration |
Step-by-step.
- Define
big_earnasSELECT emp_id, name, salary FROM employees WHERE salary > 50000. - Outer query selects
SELECT name, salary FROM big_earn ORDER BY salary DESC. - Add
LIMIT 3knowing the filtration already happened upstream logically — note:ORDER BY+LIMITapply to the outer grain after the CTE is bound, which matches how you narrate debugging ("sort the named relation, then truncate").
Worked-example solution.
WITH big_earn AS (
SELECT emp_id, name, salary
FROM employees
WHERE salary > 50000
)
SELECT name, salary
FROM big_earn
ORDER BY salary DESC
LIMIT 3;
Common beginner traps
- Forgetting that outer
WHEREcannot "reach into" the CTE's hidden predicates unless you expose columns—push stable filters into the CTE when they shrink working sets early.
Rule of thumb: if the subquery has a business meaning ("high earners," "valid orders"), name it.
3. Multiple chained CTEs — read top-down like dbt staging
Each new CTE may reference any prior CTE in the chain
The multi-CTE invariant: order matters only for name resolution—cte_n may reference cte_{1..n-1} but never forward-declare aliases you have not defined yet. This mirrors layered SQL transformations: staging → intermediates → marts.
Detailed explanation.
-
Acyclic name graph — imagine an edge from
daily_totals→raw_ordersbecause the former reads the latter. The textual order you write is a valid topological sort of that dependency DAG; swap two CTEs blindly and names break. - Grain discipline — each stage should have a one-sentence grain statement ("one row per order," "one row per calendar day per store"). When the grain shifts, rename the CTE so reviewers see the dimensional contract.
-
Debugging workflow — during a live interview, you can stub later CTEs as
SELECT * FROM prior_cte LIMIT 10to validate shapes before adding aggregates—chaining rewards that incremental reveal.
Worked example. Three-step revenue SLA filter:
| Stage | Responsibility |
|---|---|
raw_orders |
normalise ingest projections |
daily_totals |
aggregate at calendar-day grain |
flagged_days |
predicate on revenue thresholds |
Step-by-step.
-
raw_ordersprojectsevent_date,usd_revenue. -
daily_totalsaggregatesSUM(revenue)grouped by calendar date — watch for fan-out: ifraw_ordersaccidentally joins to a dimension before this step, yourSUMmultiplies; keep joins that change cardinality either explicit in a dedicated CTE or after aggregation when semantics demand it. -
flagged_dayskeeps SLA-busting days only — pure filter on an already-collapsed daily relation.
Worked-example solution.
WITH raw_orders AS (
SELECT DATE(order_ts)::date AS d, revenue_usd
FROM ingest.shopify_orders
WHERE refunded IS FALSE
),
daily_totals AS (
SELECT d, SUM(revenue_usd)::numeric(14,2) AS rev
FROM raw_orders
GROUP BY d
),
flagged_days AS (
SELECT *
FROM daily_totals
WHERE rev >= 25000
)
SELECT *
FROM flagged_days
ORDER BY d DESC;
Common chaining pitfalls
- Hidden Cartesian products — joining two wide CTEs without keys duplicates rows; verbalize keys whenever you fuse layers.
-
Leaky filters — applying
WHEREonly in the outermostSELECTwhile earlier CTEs still scan massive history; push time windows and partitioning predicates as early as the schema allows.
Rule of thumb: rename each layer after the grain (per_order, per_customer_day) so graders see dimensional thinking.
4. CTE with joins and aggregations
Join inside the CTE when the join is the reusable story
Pulling joins in sql interview questions into a CTE tells the room you know how to isolate dimension wiring before applying business filters.
Detailed explanation.
-
Freeze aggregates once — the pattern
WITH cohort_metrics AS (… GROUP BY key …)computes each group's summary exactly once in the narrative. The outer query then behaves like a probe: attach metrics to detail rows and filter on those scalars. -
Cardinality contract — after
dept_avg, you expect one row perdept_id(assumingdept_idis a key indepartments). Joining that CTE back toemployeeson the same key should not duplicate employees unless the aggregate CTE itself was built from a many-to-many path—if it was, split "explode" joins out of the aggregate stage. - Interview narration — say aloud: "first I collapse to department grain, then I join the employee spine at employee grain, then I filter." That three-beat story maps directly to the SQL.
Department averages with a reusable join CTE
Worked example. Employees enriched with departmental averages prior to outperform filters.
Step-by-step.
- Aggregate salaries by department ID.
- Join employees back to those aggregates.
- Filter rows beating their department averages.
Worked-example solution.
WITH dept_avg AS (
SELECT d.dept_id, AVG(e.salary)::numeric(12,2) AS dept_avg_salary
FROM employees e
JOIN departments d ON d.dept_id = e.dept_id
GROUP BY d.dept_id
)
SELECT e.name,
e.salary,
d.dept_name,
a.dept_avg_salary,
e.salary - a.dept_avg_salary AS spread
FROM employees e
JOIN departments d USING (dept_id)
JOIN dept_avg a USING (dept_id)
WHERE e.salary > a.dept_avg_salary;
Common beginner mistakes
- Omitting
GROUP BYstability rules while referencing raw columns beside aggregates inside the same projection. - Accidentally multiplying row counts via join cardinalities prior to aggregates—solve with guarded sub-aggregates in earlier CTEs.
Alternative sketch — correlate without a JOIN in the aggregate CTE
You can compute AVG(...) OVER (PARTITION BY dept_id) in a dedicated CTE instead of grouping first; grouped CTE + join is easier to reason about aloud when WITH readability is graded.
SQL interview question — employees beating their department average
Assume employees(emp_id, name, dept_id, salary) plus departments(dept_id, dept_name). Return every teammate whose salary exceeds that department's average, including the departmental average beside each teammate row.
Solution Using a join-friendly aggregate CTE
Code solution.
WITH dept_avg AS (
SELECT dept_id, AVG(salary)::numeric(12,2) AS dept_avg_salary
FROM employees
GROUP BY dept_id
)
SELECT e.emp_id,
e.name,
d.dept_name,
e.salary,
a.dept_avg_salary
FROM employees e
JOIN dept_avg a USING (dept_id)
JOIN departments d USING (dept_id)
WHERE e.salary > a.dept_avg_salary
ORDER BY d.dept_name, e.salary DESC;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan employees inside dept_avg
|
input multiset for averaging |
| 2 |
GROUP BY dept_id + AVG(salary)
|
department-grain relation: one dept_avg_salary scalar per dept |
| 3 | JOIN employees … USING (dept_id) |
employee-grain relation again; each worker row carries parent dept's average unchanged |
| 4 | WHERE e.salary > a.dept_avg_salary |
anti-regression filter — removes rows at or below cohort mean |
Output:
| emp_id | name | dept_name | salary | dept_avg_salary |
|---|---|---|---|---|
| 12 | Ava | Retail | 92,500 | 78,320 |
| 4 | Mei | Insights | 130,400 | 110,980 |
(Demonstrative salaries — graders care about algebraic shape.)
Why this works — concept by concept:
-
CTE dept_avg — freezes department-grain aggregates exactly once per cohort; avoids repeating
AVGsubqueries inline on every employee row in the outer text. -
JOIN USING (dept_id) — stitches scalar averages back without exploding beyond employee cardinality when
dept_avgstayed at true dept key grain. - Filter after aggregation — separates compute departmental truth vs evaluate individuals; interviewers listen for that separation as a signal you understand two different grains in one question.
-
Numeric cast —
::numeric(12,2)(or equivalent) prevents float drift in panel walkthroughs when money-like fields appear. - Cost — hash aggregate typically Θ(n) on the employee spine for the grouped leg, plus Θ(n) equi-join cost to reattach; dominated by scans unless indexed paths short-circuit.
SQL
Topic — CTE (SQL)
CTE‑focused SQL problems
SQL
Topic — joins
Join interview patterns
SQL
Topic — aggregation
Aggregation drills
5. CTE + sql window functions — rank, then filter
Separate ranking logic from predicates—two CTE beats one clever monster
Most sql window functions arcs follow rank-within partitions → predicate on ranks; folding both layers into one derived table hides mistakes under pressure.
Detailed explanation.
-
Logical processing order — window functions attach to rows after
FROM/WHERE/GROUP BY/HAVINGin the relational pipeline for that innerSELECT; you typically cannot reference a window alias in the sameSELECT'sWHEREin standard SQL—you project ranks in one CTE, filter in the next (or nest a subquery). That restriction is precisely why graders like CTE + window combos: the shape matches the semantics. -
Partition = rivalry scope —
PARTITION BY dept_idmeans "within this department bucket only, reorder rows"; rows in other departments never compete forrn = 1. -
ORDER BYinsideOVER— resolves ties;emp_idas trailing sort key buys determinism forROW_NUMBERwhen salaries collide.
ROW_NUMBER() vs cousins (pick aloud in-panel)
| function | ties at same ORDER BY keys | skips rank values after ties? |
|---|---|---|
ROW_NUMBER() |
breaks arbitrarily per sort key completeness | never — strictly 1..N per partition |
RANK() |
equal keys share rank | yes — gaps after a tie (1, 1, 3) |
DENSE_RANK() |
equal keys share rank | no gaps (1, 1, 2) |
Demand deterministic slicing ⇒ ROW_NUMBER + exhaustive ORDER BY. Reward parity for equal salaries ⇒ RANK or DENSE_RANK per business rule.
ROW_NUMBER() is deterministic for mechanical top‑N slicing
Different from RANK()—choose deliberately when duplicates must share honours.
Worked example. Produce top‑3 salaries per department with deterministic ties.
Step-by-step.
- Decorate employees with
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC, emp_id ASC). - Filter
rn <= 3in downstream scope—the second CTE (or outerSELECT) is your predicate stage isolated from analytic decoration.
Worked-example solution.
WITH ranked AS (
SELECT dept_id,
emp_id,
name,
salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC, emp_id ASC
) AS rn
FROM employees
)
SELECT *
FROM ranked
WHERE rn <= 3
ORDER BY dept_id, rn;
SQL interview question — top two salaries whenever a department staffs at least two people
Assume employees(dept_id, emp_id, name, salary). Return rows only for departments with population ≥ 2, showing the highest two salaries in each (ROW_NUMBER ordering, break ties on emp_id).
Solution Using stacked CTEs for readability
Code solution.
WITH ranked AS (
SELECT dept_id,
emp_id,
name,
salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC, emp_id ASC
) AS rn,
COUNT(*) OVER (PARTITION BY dept_id) AS dept_pop
FROM employees
),
top_two AS (
SELECT *
FROM ranked
WHERE dept_pop >= 2
AND rn <= 2
)
SELECT dept_id, emp_id, name, salary
FROM top_two
ORDER BY dept_id, rn;
Step-by-step trace.
| step | action |
|---|---|
| 1 |
ranked decorates rows with deterministic rn; grain unchanged vs base employees — one output row still per (dept_id, emp_id). |
| 2 |
COUNT(*) OVER (PARTITION BY dept_id) is a pure scalar broadcast inside each dept — computes headcount without self-joining aggregates back. |
| 3 |
top_two enforces BOTH “big enough dept” plus podium depth simultaneously — logically equivalent to HAVING on dept size after hypothetical GROUP BY, but keeps employee-level projection intact. |
Output:
| dept_id | emp_id | name | salary |
|---|---|---|---|
| 10 | 4 | Ava | 98,400 |
| 10 | 7 | Omar | 95,050 |
| 20 | 1 | Zoe | 120,010 |
Why this works — concept by concept:
- Window PARTITION BY — constrains rivalry to departmental peers exclusively; cross-department ordering is irrelevant noise.
-
ROW_NUMBER semantics — hands you distinct ranks per partition for deterministic slicing machinery when tie-break columns are explicit (
emp_id). -
COUNT(*) window — encodes interviewer guardrails (“only multi-person teams”) without collapsing rows pre-rank (
GROUP BYwould destroy per-employeern). -
Stacked CTEs — separates decorate (
ranked) from predicate (top_two), mirroring dialect rules about filtering window outputs. -
Cost intuition — window sort/heaps approach Θ(n log n) dominance per partition in typical merge-sort planners; mention index-friendly
ORDER BYkeys (dept_id, salary DESC) as an optimization avenue if indexes exist.
SQL
Topic — CTE
CTE topic lane
SQL
Topic — window functions
Window-function SQL problems
SQL
Topic — CTEs
Broader CTE practice set
6. Recursive CTE — hierarchies without leaving SQL
WITH RECURSIVE stitches an anchor SELECT to an inductive UNION ALL spine
Recursive patterns answer organisation charts, bill-of-material explosions, controlled sequence generation—for acyclic hierarchies WITH RECURSIVE lands squarely inside CTE in SQL interviewer expectations across Postgres-first DE loops.
Detailed explanation.
On each evaluation round, SQL engines conceptually compute:
-
Anchor (non-recursive
SELECT) — initial working set (frontier, generation 0). -
Recursive member —
SELECT … JOIN recursive_cte_alias …; the join references theWITH RECURSIVEalias (tree,seq, …). Results areUNION ALL-appended unless you explicitly demand dedupe withUNION. - Fixed point — iteration stops when the recursive member contributes no new rows under the engine's recursive evaluation rules (graphs with cycles need explicit guarding — anchors + cycle columns or rewriting — otherwise recursion becomes pathological).
Invariant checklist:
- Anchor SELECT gathers starting frontier rows (often roots where parent keys are NULL).
- Recursive SELECT joins the prior frontier back to driving base rows.
-
UNION ALLappends successive generations (UNIONwhen dedupe is mandated explicitly and you accept DISTINCT cost). -
Cycles blow up naive trees — state how you would detect or prevent them (
CYCLE, path arrays,UNIONdedupe semantics, or procedural escape hatches).
WITH RECURSIVE seq(n) AS (
SELECT 1
UNION ALL
SELECT n + 1
FROM seq
WHERE n < 5
)
SELECT * FROM seq;
This toy emits {1…5} — it isolates recursion mechanics before you attach employees edges.
SQL interview question — enumerate every descendant under VP vp_id
Given employees(emp_id, name, manager_id) with an acyclic tree, vp_id acts as subtree root. Emit every reachable employee with hierarchical level numbering starting at 0 for the VP herself.
Solution Using Postgres WITH RECURSIVE
Code solution.
WITH RECURSIVE tree AS (
SELECT emp_id,
name,
manager_id,
0 AS level
FROM employees
WHERE emp_id = :vp_id
UNION ALL
SELECT e.emp_id,
e.name,
e.manager_id,
tree.level + 1
FROM employees e
JOIN tree ON e.manager_id = tree.emp_id
)
SELECT emp_id,
name,
level,
REPEAT(' ', level) || name AS indent_name
FROM tree
ORDER BY level, emp_id;
Step-by-step trace.
| frontier | expands by |
|---|---|
| depth 0 | VP seed row from anchor — subtree root anchored at interviewer parameter |
| depth k ≥ 1 | every employee whose manager_id references someone already resident in tree (inductive join) |
Expansion halts when the recursive SELECT would append only rows already reachable—under cycle-free graphs, breadth grows until leaf managers fail to recruit new hires. Worst-case work proportional to |subtree| edges traversed modulo engine batching semantics.
Output:
| emp_id | name | level | indent_name |
|---|---|---|---|
| 100 | Quinn | 0 | Quinn |
| 110 | Ravi | 1 | ··Ravi |
Why this works — concept by concept:
-
Anchor picks root identity — pins recursion context to interviewer-supplied
vp_id; switching roots is a literal parameter tweak, no structural rewrite. -
JOIN aligns generations — children attach only to parents already enumerated, matching org-tree edge direction (
employee.manager_id → parent.emp_id). -
Level accumulator —
0at VP, increments per hop—communicates depth for indented displays, maximum-depth caps (WHERE level <= …), or post-filtering slices in outer queries. -
UNION ALL composition — appends frontier rows without collapsing duplicates that
UNION DISTINCTmight hide prematurely on messy data (distinctness still comes fromemp_iduniqueness in a clean HR dimension). - Complexity intuition — visits each node in the reached subtree once along each valid parent link in acyclic settings; cycles break this story—call that out explicitly in senior panels.
SQL
Topic — CTE/SQL hub
Deep CTE‑SQL drills
Python
Topic — recursion
Recursion practice (adjacent muscle)
SQL
Language — SQL
All SQL problems
7. CTE vs subquery vs temp table — interview trade-offs
Know which tool implies which lifecycle story
Three-way grading grid interviewers memorize:
| Tool | Lifetime | Typical debugging | Replay story |
|---|---|---|---|
CTE (WITH) |
enclosing statement only | annotate pipeline layers mentally | rerun full statement blob |
| Inline derived table | same statement span | cramped syntax | rerun entire mega-expression |
CREATE TEMP TABLE |
session-bound | rerun arbitrary downstream slices | iterative analyst workflow |
Detailed explanation.
-
Inlining vs forcing materialization — on PostgreSQL 12+,
WITH foo AS (...) SELECT …usually inherits optimisation like an inline derived table; prependMATERIALIZED(orNOT MATERIALIZED) once you deliberately shape evaluation order (control duplicate work) or tame estimated rows quirks the optimizer misses. Naming the dialect matters: Snowflake / BigQuery / other warehouses each implement hints differently — never claim portability without hedging. - Subquery multiplicity — identical inline subqueries in one statement may execute more than once unless the planner deduplicates identical fragments — CTE naming makes reuse intent obvious to collaborators even when plans merge.
-
Temp tables redeem exploration — you can
CREATE INDEX ON tmp_…for repeated joins inside a detective session; that is awkward for ephemeral CTEs. Temps also interoperate across ORM/console round-trips in one session when you slice-and-dice predicates interactively.
Interview sound bite: "CTEs optimise communication during one deterministic statement; **TEMP tables optimise iterative exploration where you rerun predicates interactively.**"
Worked-example solution.
-- Exploration inside psql → TEMP wins iteration ergonomics:
CREATE TEMP TABLE tmp_high_value_orders AS
SELECT *
FROM staging.orders
WHERE revenue_usd > 500;
SELECT customer_id,
AVG(revenue_usd) avg_rev
FROM tmp_high_value_orders
GROUP BY 1;
-- vs production SQL model favouring readability:
WITH high_value_orders AS (
SELECT *
FROM staging.orders
WHERE revenue_usd > 500
)
SELECT customer_id,
AVG(revenue_usd) avg_rev
FROM high_value_orders
GROUP BY 1;
Choosing in one breath
Use CTEs for published pipelines and pairing windows with filters; reserve TEMP for sandbox hypotheses, brute-force cardinality introspection, and multi-query rehearsals; reserve opaque inline subqueries for one-shot predicates where naming hurts more than helping.
Rule of thumb: articulate lifetime + collaborator audience plainly—students often chant "plans identical" without naming engine nuances or session economics.
SQL
Topic — subqueries
Subquery practice lane
Choosing CTE usage (checklist)
| Situation | Reach for … | Avoid … |
|---|---|---|
| Multi-phase authored SQL destined for repos / reviews | chained CTEs | one mega SELECT nobody dares refactor |
| Window ranking + nuanced filters | sequential CTEs | deeply nested analytic soup |
| Acyclic hierarchies entirely inside warehouse dialect |
WITH RECURSIVE pathways |
bouncing back to procedural-only apps prematurely |
| Ad-hoc investigation with iterative reruns |
TEMP TABLE + helpful indexes |
retyping massive CTE trees each tweak |
Frequently asked questions
What are typical cte in sql interview questions?
Expect definition ("what token opens a CTE?" → WITH), lifecycle (statement scope vs session objects), readability refactors (nested subquery → named pipeline), window + CTE combos rank-then-filter, and trees via WITH RECURSIVE. Panels often wedge one optimisation empathy question ("would you force materialisation? why?"). Answer each with two crisp sentences: mechanism first, dialect caveat second—not acronym dumps.
How is a CTE different from a temporary table?
A CTE is logical sugar glued to one enclosing statement: no catalog object, disappears when the batch finishes unless you wrap it differently. CREATE TEMP TABLE … allocates session-scoped physical storage, survives until disconnect or DROP, and supports indexes / repeated downstream queries comfortably. Reach for whichever matches lifetime and whether collaborators need iterative rerun ergonomics—not whichever "feels fancier".
When should WITH RECURSIVE surrender to procedural graph code?
Stay in SQL while the graph stays moderate, acyclic (or you have explicit CYCLE / path handling patterns for your dialect), and results feed relational BI directly. Pivot to procedural / graph libs when cycles are common without clean detection, branching explodes runtimes, validations span systems outside the warehouse contract, or you need fine-grained backtracking/search primitives SQL does not express cleanly.
Do CTEs automatically materialise intermediate results?
No blanket guarantee. Many optimisers inline / fold ordinary CTEs like named subqueries. PostgreSQL exposes MATERIALIZED / NOT MATERIALIZED to steer evaluation; cite those only alongside the engine name. Recursive CTEs follow different planner stories—mention termination semantics if the interviewer probes performance cliffs.
How should sql interview questions with answers narratives flow?
Lead with legible layering (stage name → grain), paste minimal runnable SQL, then narrate Step-by-step trace from base relations outward, freeze an Output table—even toy numbers—and finish with Why this works tying grain, join cardinality, aggregation vs window semantics, and coarse cost intuition. Mirrors how this article stitches panel answers together.
What one-line summary should stick in recall?
"WITH* names intermediate relations so collaborators reason about algebraic stages the same way dbt exposes staging → mart layers—but never confuse naming with persisted storage unless you pinned it there yourself."*
Practice on PipeCode
PipeCode ships 450+ data-engineering interview problems—including PostgreSQL-first SQL practice keyed to WITH chains, joins, aggregation, CTE + sql window functions, and branching recursive reasoning.
Kick off via Explore practice →; drill the dedicated CTE(SQL) lane →; fan out across CTE topic → or CTE(s) bucket →; deepen window-functions SQL practice →; rehearse join SQL drills →; reinforce aggregation practice → whenever grouped metrics underpin your predicates.





Top comments (0)