DEV Community

Cover image for SQL for Developers: Relational Foundations, Safe CRUD, Joins, Aggregates & Performance Muscle Memory
Gowtham Potureddi
Gowtham Potureddi

Posted on

SQL for Developers: Relational Foundations, Safe CRUD, Joins, Aggregates & Performance Muscle Memory

SQL for developers is how you read and write the systems you already ship — user accounts, orders, feature flags, observability tables. Backend engineers, full-stack builders, and data engineers share the same primitives: relational tables, stable keys, honest JOIN semantics around NULL, explicit grain for aggregates, and ACID discipline when concurrency hits.

What follows mirrors how teams onboard ICs — schema literacy, guarded CRUD, predicate hygiene, joins without silent row multiplication, GROUP BY / HAVING versus window analytics, then indexes, transactions, and EXPLAIN as your debugging lingua franca — every numbered skill block ends like sql interview questions with answers: runnable Postgres SQL, traced execution, and a terse why. After the hero art, dive straight into reps when you crave keyboard time:

PipeCode blog header for SQL for developers — bold white headline 'SQL for Developers' with subtitle 'Joins · transactions · pragmatic queries' and a minimal Postgres-style terminal icon diagram on dark gradient with pipecode.ai attribution.

Browse practice hub →, open SQL practice →, deepen joins →, reinforce filters →, or widen with database fundamentals →.


On this page


1. Why SQL matters for developers and data engineers

The relational contract hiding behind every HTTP handler

Invariant: SQL is the persisted half of almost every SaaS workload — signup rows, entitlement tables, payout ledgers. SQL for developers fluency separates engineers who prototype quickly from teammates who confidently answer "why did this counter disagree with finance?" without escalating.

Detailed explanation. Application tiers think in verbs (POST, PATCH, enqueue) while relational engines expose predicates, constraints, joins, and transactions. When those layers disagree, outages look like phantom bugs but read as inconsistent reads, missing indexes, duplicate grain, or unscoped transactions underneath.

Pro tip: Name grain aloudexactly one row means one _______before you accept a JOIN plan or BI metric.

Why backends, analytics, and SRE converge on SQL

Everybody eventually asks Postgres the same primitives: correlations, aggregates, cardinality checks. Showing up fluent collapses Slack threads into scripts you can rerun, diff, commit to sql/ snippets, instrument in CI smoke tests — even if warehouses later optimise the OLAP clones.

Worked example.

persona recurrent question type payoff from SQL literacy
backend UPDATE fan-out / orphaned FK rows deterministic migrations
DS / analytics reproducible cohort filters parameterized queries
SRE blast-radius queries during incidents EXPLAIN-aware rollbacks

Step-by-step.

  1. Incident alert references revenue drift — replicate via JOIN spanning paymentsrefunds.
  2. Count rows twice — once naive, once with explicit DISTINCT grain_key assertions.
  3. Validate indexes cover WHERE + JOIN predicates in staging before prod deploy.
  4. Document literal-free SQL snippets for analytics parity.

OLTP versus analytic workloads

workload emblematic SQL optimise for
OLTP (INSERT, UPDATE critical rows) UPDATE … WHERE id = $1 RETURNING … short latch-friendly txn windows
Warehouse / OLAP SUM(x) GROUP BY day (+ optional windows) columnar parallelism, partition pruning
Replica analytics bridging both parameterized slice queries reproducible predicates + timeouts

Detailed explanation. Mixed-mode Postgres instances still differentiate by predicate selectivity, txn duration, and hardware headroom. Front-line CRUD hates long reads holding locks; heavyweight BI tolerates eventual freshness but demands honest scan plans.

Common beginner traps

  • Mirroring spreadsheets — unstructured columns creep into DDL without reviewers.
  • Trusting dashboards over source tables — BI layers often coerce grain silently.
  • ORM-only troubleshooting — you still need emitted SQL logs for hidden N+1 joins.
  • Ignoring isolation trade-offsREAD COMMITTED anomalies appear only under concurrency realism.

2. Tables, keys, and the shape of relational data

Tables are unordered multisets guarded by declarative constraints

Invariant: a row-oriented table expresses one record type; keys pin identity while foreign keys articulate relationships declaratively rather than scattering pointer logic exclusively in Ruby/Java services.

Relational schema diagram showing users and orders tables with primary keys, columns, and a foreign-key arrow from orders.user_id to users.id on a light PipeCode card.

Primary versus natural identifiers

Natural keys mirror business artefacts (ISO country code). Surrogate IDs (BIGSERIAL) stay stable across refactors yet carry no domain meaning alone. Postgres typically combines both: surrogate PK for joins, UNIQUE(email) to enforce human-facing identity.

Worked example.

design advantage caveat
BIGSERIAL PRIMARY KEY immutable join edges meaningless for humans
UNIQUE(email) legible dashboards brittle if mergers rename emails

Worked-example DDL commentary. BIGSERIAL auto-sequence ensures monotonic surrogates with cheap index inserts. NOT NULL on email encodes onboarding invariants Postgres enforces deterministically versus optional app validation.

CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    email       TEXT NOT NULL UNIQUE,
    city        TEXT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE orders (
    order_id    BIGSERIAL PRIMARY KEY,
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    total_usd   NUMERIC(14,2) NOT NULL CHECK (total_usd >= 0),
    placed_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX orders_user_idx ON orders (user_id);
Enter fullscreen mode Exit fullscreen mode

Foreign-key delete semantics matter in production churn

The difference between ON DELETE CASCADE (waves downstream deletes) versus RESTRICT (blocking deletes referencing kids) dictates safe admin tooling workflows. Prefer explicit policies over implicit defaults guessed during incidents.

-- Example: disallow deleting buyers with unpaid orders lingering
ALTER TABLE orders
DROP CONSTRAINT orders_user_id_fkey,
ADD CONSTRAINT orders_user_id_fkey
    FOREIGN KEY (user_id)
    REFERENCES users(id)
    ON DELETE RESTRICT;
Enter fullscreen mode Exit fullscreen mode

Step-by-step.

  1. Model parent (users), child (orders) cardinality first.
  2. Select delete policy matching business law — finance rarely cascades blindly.
  3. Index child FK columns (CREATE INDEX on orders.user_id).
  4. Add CHECK constraints early — cheaper than patching corrupt rows later.

Common beginner mistakes

  • Deferring FK creation → silent orphan rows creep under concurrent writers.
  • Currency as floats — use NUMERIC(p,s) for monetary columns.
  • Timestamp without zone interpreted as UTC — prefer TIMESTAMPTZ + explicit TZ policy.
  • Overloading JSONB exclusively — structured columns keep optimiser hints honest.

Quick integrity checklist

check Postgres hook
uniqueness UNIQUE, partial unique indexes
domain logic CHECK, domain types
referential coupling declarative FK + chosen ON DELETE
auditing triggers or append-only ledger tables

Rule of thumb: if two services disagree on cardinality, converge on DDL truth before arguing in Slack.


3. Reading and writing rows — SELECT, INSERT, UPDATE, DELETE safely

Every writer path needs identity — explicit filters and returning projections

Invariant: INSERT, UPDATE, DELETE must name which rows mutate (WHERE), prefer parameters ($1) over interpolated strings, and return affected rows (RETURNING) when callers need confirmations without another round-trip.

Transactions wrap multi-leg business truths

Transfers, swaps, entitlement downgrades seldom touch one isolated row atomically satisfying business law. Wrap coordinated statements (BEGIN … COMMIT) so partial states never linger.

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • __BEGIN__ opens snapshot / locking context per isolation level chosen.
  • Two ordered UPDATEs express money conservation — auditors expect atomicity tie.
  • __COMMIT__ publishes both deltas together; ROLLBACK; rewinds catastrophes.
  • `Cost — transactional overhead negligible versus financial inconsistency fallout.

Parameterized predicates prevent injection and cache reuse

Never splice user strings manually — placeholders keep plans cacheable and thwart SQL injection.

sql
DELETE FROM sessions
WHERE user_id = $1
AND expires_at < NOW();

INSERT patterns developers lean on daily

Bulk ingest + upsert choreography appears constantly — understand both single-row ergonomics (RETURNING) and batch paths.

`sql
INSERT INTO users (email, city)
VALUES ('dev@corp.com', 'Chennai')
RETURNING id, created_at;

INSERT INTO audit_log(event, payload)
VALUES
('password_reset', '{"user_id": 42}'::jsonb),
('mfa_challenge', '{"user_id": 42}'::jsonb);
`

Worked example — idempotent onboarding upserts (ON CONFLICT).

sql
INSERT INTO user_profiles (user_id, display_name)
VALUES ($1, $2)
ON CONFLICT (user_id)
DO UPDATE SET display_name = EXCLUDED.display_name,
updated_at = NOW();

Detailed explanation.

  • Conflicting rows recycle through EXCLUDED pseudo-table exposing proposed values.
  • Pair with partial unique indexes for conditional uniqueness scenarios (invite tokens, soft deletes).

Safe UPDATE hygiene checklist

step rationale
dry-run SELECT clone verifies row cardinality before mutation
LIMIT + key filter avoids blanket table rewrite
RETURNING old.* auditing emits before/after diffs programmatically

Rule of thumb: if deleting “temp junk,” still scope by timestamp + TTL — surprise full-table wipes bankrupt trust faster than slow queries.


4. Filtering, NULLs, sorting, and LIMIT

Logical evaluation order is not top-to-bottom textual order

SQL logical processing pipeline diagram — FROM JOIN WHERE GROUP BY HAVING SELECT ORDER BY LIMIT — staged left-to-right with PipeCode arrows on light background.

Invariant: Engines conceptually apply clauses in this orderFROMJOINWHEREGROUP BYHAVING → windowing → SELECT expressions → DISTINCTORDER BYLIMIT/OFFSET. Textual SQL order differs deliberately; misplacing expectations causes “phantom” aggregates or illegal references.

Detailed explanation.

  • Predicates in WHERE see raw row grain before collapsing via GROUP BY.
  • SELECT aliases rarely appear inside WHERE (Postgres exceptions exist for subqueries, not shortcuts).
  • ORDER BY runs after projection — you may sort computed expressions.

NULL is tri-valued logic (TRUE, FALSE, UNKNOWN)

Comparisons involving UNKNOWN ripple through compound predicates unpredictably unless you memorize De Morgan interactions with AND/OR.

`sql
SELECT * FROM users WHERE city IS NULL; -- correct
SELECT * FROM users WHERE city = NULL; -- ALWAYS UNKNOWN → zero rows (pitfall)

SELECT *
FROM experiments
WHERE status IS DISTINCT FROM outcome; -- treats NULL knowingly
`

Worked example.

predicate evaluates when city NULL
city = 'Chennai' UNKNOWN
city IS NULL TRUE

Worked-example pattern — COALESCE bridging optional columns.

sql
SELECT COALESCE(city, 'unknown') AS city_label
FROM users;

Combine with NULLIF to coerce sentinel blanks into canonical NULL semantics.

Composing predicates responsibly

Prefer explicit parentheses when mixing AND/OR:

sql
SELECT id, email
FROM users
WHERE (city IN ('Hyderabad', 'Chennai') OR vip IS TRUE)
AND suspended IS FALSE;

ORDER BY, LIMIT:

sql
SELECT id, email
FROM users
WHERE city IN ('Hyderabad', 'Chennai')
ORDER BY email ASC
LIMIT 25 OFFSET 50;

Detailed explanation.

  • Large OFFSET scans skipped rows wastefully — keyset pagination (WHERE id > $cursor ORDER BY id LIMIT) often cheaper at scale even if ergonomically heavier.
  • Stable sorts duplicate tie-break columns (ORDER BY created_at DESC, id DESC) to defeat nondeterministic pages.

Common beginner traps

  • NOT IN (...) collapses unexpectedly when inner list harbors NULL (entire predicate UNKNOWN).
  • BETWEEN inclusive endpoints surprise folks expecting half-open intervals.
  • ILIKE '%foo%** cannot exploit plain B-tree indexes without pg_trgm / expression indexes.

Optional pattern — existential filters with EXISTS

Prefer semijoins when probing presence without caring about multiplicity:

sql
SELECT u.id
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);

Rule of thumb: when debugging filters, SELECT COUNT(*) before and after layering predicates — divergence isolates offending clause fast.


5. Joins — INNER, LEFT, and when rows multiply

Joins reshape cardinality — tame fan-out before aggregates

Invariant: JOIN combines row sets via predicates (typically equality on keys). When either side is one-to-many, the result repeats left rows once per matching right row unless you stabilize grain first.

Two-circle Venn-style diagram illustrating INNER JOIN intersection vs LEFT JOIN left circle plus orphans, with captions 'matching pairs only' and 'keep every left row' on PipeCode infographic.

Detailed explanation.

  • INNER JOIN emits only pairs that satisfy the ON clause — unmatched rows on either side disappear.
  • LEFT [OUTER] JOIN keeps every row from the left spine; unmatched right columns become NULL (sentinel absence, not “empty string”).
  • FULL OUTER is rarer in app code but useful when reconciling two feeds where either side might be orphaned.

`sql
-- Buyer activity: users who ordered at least once (one row per matching order pair)
SELECT u.name, o.order_id
FROM users u
JOIN orders o ON o.user_id = u.id;

-- Anti-join — users present on the LEFT with NO matching orders on the RIGHT
SELECT u.name
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.order_id IS NULL;
`

INNER vs LEFT in one rehearsal table

join survives without match on opposite side read it as
INNER no “pairs only — inner intersection”
LEFT yes (left survives) “keep cohort A, optionally attach B”

Fan-out rehearsal (why COUNT(*) lied)

spine facts INNER join rows if 3 matching orders
1 Alice 3 orders for Alice 3 rows named Alice

COUNT(*), SUM(amount) downstream now count result rows, not necessarily distinct users. Fix upstream with distinct keys, sub-aggregates, or semi-joins (EXISTS / IN) depending on semantics.

Developer SQL interview question — users who never ordered

Tables: users(id, name) and orders(order_id, user_id) with referential integrity. List dormant accounts (users who never placed an order), ordered deterministically.

Solution Using LEFT JOIN anti-join

Code solution.

sql
SELECT u.id, u.name
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.order_id IS NULL
ORDER BY u.id;

Step-by-step trace.

step planner story
1 Build left spine — one output row candidate per user.
2 For each user, seek/probe matching orders on orders.user_id.
3 If no row qualifies, o.* becomes NULL, including surrogate order_id.
4 WHERE o.order_id IS NULL keeps only unmatched left rows (anti-pattern if order_id could be NULL absent FK — integrity prevents that here).

Output:

id name
404 dormant

Companion pattern — NOT EXISTS (duplicate-safe semi-join).

sql
SELECT u.id, u.name
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
)
ORDER BY u.id;

Why this works — concept by concept:

  • LEFT preservation — every user survives until filtered; dormant cohort remains visible.
  • NULL sentinel — with a real surrogate key on orders, order_id NULL reliably means no attachment, not ambiguous data void.
  • NOT EXISTS — logically ignores duplicate orders per user (semijoin); no accidental multiplier from exploding the right side.
  • Index leverageB-tree on orders(user_id) turns probes into seeks instead of scans.
  • Cost — hash or merge-family join tends toward Θ(n + m) cardinality with healthy selectivity versus nested-loop blowups on accidental cross products.
SQL Topic — joins Join-heavy SQL drills

Practice →

SQL
Language — SQL
SQL language library

Practice →


6. GROUP BY, HAVING, and analytics-style windows

Collapse grain or stay row-wise — aggregates vs windows split the job

Invariant: GROUP BY collapses rows into buckets, producing one output row per group after aggregation. OVER() keeps input grain — every source row survives while you decorate it with comparative metrics (rank, running sum).

Contrast diagram grouping rows into buckets labelled GROUP BY aggregates versus per-row analytic functions with OVER() partitions for running totals — PipeCode infographic.

Detailed explanation.

  • After JOIN/WHERE, the relational engine optionally groups by the listed expressions. SELECT may reference either grouping keys or aggregate functions evaluated inside each bucket — stray base columns violate the contract unless functionally dependent (Postgres errs early; beware lenient dialects permitting hidden ambiguity).
  • HAVING filters post-aggregation predicates (COUNT(*) > 10), while WHERE trims rows before bucketing (cannot cite aliases from SELECT aggregates — repeat the aggregate or nest a subquery).
  • Window frames (ROWS / RANGE / GROUPS) default per function; analytic ranking (ROW_NUMBER, RANK, DENSE_RANK) ignores frame — they only need PARTITION BY + ORDER BY.

Stock patterns side by side

`sql
-- Collapse city grain — one row per city after aggregation
SELECT city, COUNT() AS n
FROM users
GROUP BY city
HAVING COUNT(
) > 5;

-- Decorate employee grain — ranking within department without collapsing rows
SELECT emp_id,
dept_id,
salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees;
`

Dedup companion — deterministic keeper rows

Grouped counts diagnose violators; windows remediate duplicates while picking one canonical row (Partition by key, ORDER BY audit columns, filter WHERE rn = 1):

sql
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY updated_at DESC NULLS LAST, id ASC
) AS rn
FROM users
)
SELECT *
FROM ranked
WHERE rn > 1; -- offenders to DELETE or archive in a batched migration

Developer SQL interview question — duplicate emails awaiting cleanup

List normalized emails appearing more than once with violation counts.

Solution Using grouped counts

Code solution.

sql
SELECT email, COUNT(*) AS dup_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY dup_count DESC, email;

Step-by-step trace.

step computation
1 Normalize upstream if needed (LOWER(TRIM(email)) in ingestion or projection).
2 GROUP BY email emits one accumulator per distinct key.
3 HAVING COUNT(*) > 1 drops singleton buckets — only violators survive.

Output:

email dup_count
dev@corp 3

Why this works — concept by concept:

  • GROUP BY bucket — each email becomes its own multiset; COUNT(*) measures multiplicity after predicates.
  • Having vs whereWHERE cannot express COUNT(*) > 1 without a subquery; HAVING runs on aggregated state.
  • Operational pairing — follow with ROW_NUMBER() partitioning on the same dedupe key to choose survivors without arbitrary ties.
  • Cost profile — hash aggregate typically linear in spilled row volume; spilled sorts add I/O proportional to work_mem pressure.
SQL Topic — aggregation Aggregation drills

Practice →

SQL
Topic — window functions
Window SQL drills

Practice →

SQL
Topic — group-by
GROUP BY drills

Practice →


7. Indexes, transactions, ACID, and EXPLAIN-aware debugging

Indexes + planners + transactional contracts separate “queries” from “systems”

Detailed explanation.

  • Default B-tree indexes accelerate equality / range predicates (WHERE city = 'Hyderabad' or WHERE created_at BETWEEN …) via seeks instead of sequential scans once selectivity warrants them.
  • Indexes are derived state — every insert/update/delete touches matching index entries (write amplification); partial indexes prune maintenance when predicates cover a cohort (WHERE churned IS FALSE).
  • CREATE INDEX CONCURRENTLY avoids long ACCESS EXCLUSIVE locks on Postgres hot tables during build — trade-off is longer DDL and retry bookkeeping if build fails midway.

`sql
CREATE INDEX CONCURRENTLY idx_users_city ON users (city);

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM users
WHERE city = 'Hyderabad';
`

Interview-grade EXPLAIN reading hints.

  • Prefer EXPLAIN (ANALYZE, BUFFERS) for truth about buffers hit vs read, loops, actual row counts — textual plans alone omit runtime skew.
  • Watch estimated vs actual row mismatches (bad stats ⇒ wrong join order, surprise nested loops on big inner sides).
  • Seq Scan acceptable on narrow tables / cold caches; punitive when millions of qualifying rows funnel through filter after scan.

ACID anchors

letter mnemonic what to say in-panel
A Atomicity all statements commit together or ROLLBACK rewinds observable effects
C Consistency constraints + declarative checks hold on commit boundaries
I Isolation levels trade phantom reads vs concurrency — READ COMMITTED default on Postgres exposes committed deltas each statement; REPEATABLE READ / MVCC snapshots tame drift for longer read transactions
D Durability WAL persists committed work across crashes

Practical transaction hygiene

Declare explicit boundaries (BEGIN / COMMIT) when orchestrating multi-table invariants — ORMs emitting autocommit per statement fracture money-movement narratives. Serialization failures (SQLSTATE 40001) under SERIALIZABLE / snapshot conflicts signal retry opportunities, not “random database bugs.” Pair schema changes with CONCURRENT index creations and phased backfills whenever traffic tolerates eventual consistency stages.


Choosing SQL skills for your stack (checklist)

horizon competency Practice lane
Week 1 DDL + FK stories /explore/practice/topic/database →
Week 2 predicates + paging /explore/practice/topic/filtering/sql →
Week 3 joins + cardinality /explore/practice/topic/joins/sql →
Week 4 aggregates/windows /explore/practice/topic/aggregation/sql →

Frequently asked questions

Which dialect first?

PostgreSQL is the pragmatic default — rich standard SQL surface, expressive JSONB, strong isolation story, ubiquitous in modern stacks, and interview panels often cite it explicitly. Treat SQLite as excellent for correctness drills and EXPLAIN QUERY PLAN intuition, MySQL where legacy hiring signals demand it — but defer dialect trivia until relational mechanics feel automatic.

LEFT JOIN … IS NULL versus NOT EXISTS?

Both express relational difference (anti-semijoins). LEFT JOIN + NULL filter is readable when order_id is a dependable surrogate and you want symmetry with exploratory outer joins. NOT EXISTS scales mentally when duplicates on the right explode intermediate row counts — multiplicity does not falsify existential absence. Mention both in reviews; pick based on cardinality and planner friendliness verified with EXPLAIN.

When do aggregates beat windows?

Collapse to metric tables or cohort summaries ⇒ GROUP BY + aggregates + HAVING. Preserve row grain while ranking/deduping/running deltas ⇒ OVER() partitions. Mixed patterns stack CTE layers: aggregate subtotals upstream, JOIN keyed aggregates back, then window across enriched rows once uniqueness returns.

Are indexes unequivocally good?

No — each index consumes storage, lengthens mutation paths (insert/update hotspots), and complicates migrations. Prefer narrow partial indexes, covering composites aligning with ORDER BY when read savings dominate, but baseline with metrics (p95 latency, churn on write-heavy queues) rather than speculative indexing folklore.

What proves seniority fastest?

Fluent EXPLAIN storytelling (buffer churn, mis-estimates), articulating isolation anomalies you have actually chased (lost updates, phantom reads), and disciplined schema evolution (locking, backfills, concurrency-safe DDL). Read alone plateaus — annotate past incidents with reproduction SQL and planner diffs during debrief loops.

Do ORMs replace raw SQL literacy?

ORMs scaffold CRUD ergonomics quickly, yet N+1, implicit transaction scopes, brittle migrations, deadlock graphs, and hot-index regressions still surface raw SQL realities. Seniors jump between declarative mappings and handwritten SQL confidently because production incidents rarely respect abstraction boundaries exclusively.


Practice on PipeCode

PipeCode ships 450+ interview-grade problems spanning SQL joins, aggregates, transactional logic, analytics windows, subqueries, and pragmatic schema debugging. Anchor on Explore practice →, escalate through language SQL →, grab sets like topic SQL → or subqueries →, and unlock plans → whenever you want unrestricted runs.

Top comments (0)