Choosing the right SQL data types is one of the quiet decisions that shapes storage, correctness, and query behavior in PostgreSQL. In a tight SQL screen, interviewers often follow up on why you picked a type—not only whether the query returns rows. This guide walks through the main families, common pitfalls (rounding, time zones, type mismatches), and how to reason about casts—using PostgreSQL syntax, the same dialect PipeCode uses for practice.
If you want hands-on reps after you read, explore practice →, drill SQL problems →, browse SQL by topic →, or open Zero to FAANG SQL (full fundamentals) → for a structured path.
On this page
- Why column types matter
- Numeric types
- Text and binary
- Boolean and NULL
- Date and time
- Semi-structured and other types
- Casting and comparison rules
- Choosing types (checklist)
- Frequently asked questions
- Practice on PipeCode
1. Why column types matter
Storage, comparisons, indexes, and the cost of silent coercion
"Why did you pick that type?" is the single most common SQL-screen follow-up — and the cleanest answer is that a column's type controls four downstream things at once: how the value is laid out on disk, which operators compare it correctly, which indexes the planner can actually use, and when PostgreSQL has to silently coerce data behind your back. Get the type right and joins are fast, comparisons are unambiguous, and disk pages are dense. Get it wrong and you ship a schema that runs but quietly returns the wrong answer or scans 10× more pages than it should.
Pro tip: When you walk an interviewer through a
CREATE TABLE, say the grain and the type in the same breath: "one row per order,order_idisBIGINT,totalisNUMERIC(14,2)." That single habit signals to the interviewer that you think about column types as design decisions, not afterthoughts.
Storage footprint and on-disk layout
The storage invariant: fixed-width integer and timestamp types occupy a known number of bytes (4 or 8) and never expand; variable-width types (TEXT, NUMERIC, JSONB) carry a length prefix and grow with the value; choosing a tighter type packs more rows per 8 KB page and improves cache locality on every read. A wider type is rarely free — even when the bytes look free, the planner statistics and TOAST thresholds shift.
-
INTEGER— 4 bytes, range ±2.1 B; default for surrogate counts. -
BIGINT— 8 bytes; required when row counts cross ~2 B or for user-facing IDs. -
NUMERIC(p, s)— variable (~2 bytes overhead + 2 bytes per 4 digits); cost grows with precision. -
TEXT/VARCHAR(n)— variable; no storage penalty forTEXTvsVARCHARwith the same content.
Worked example. A 100 M-row events table sized two ways:
| design | per-row bytes | total |
|---|---|---|
event_id BIGINT, ts TIMESTAMPTZ, user_id BIGINT |
24 | ~2.4 GB |
event_id BIGINT, ts TIMESTAMPTZ, user_id TEXT (avg 18 chars) |
24 + 20 = 44 | ~4.4 GB |
Step-by-step.
- Fixed-width row (
BIGINT, TIMESTAMPTZ, BIGINT) is 24 bytes on the heap regardless of values. - Replacing the integer
user_idwithTEXTfor a UUID-shaped string adds a 4-byte length header plus the bytes of the text itself. - With ~100 M rows, the variable-width design adds ~2 GB to the table heap alone, before indexes.
- The wider rows also fit fewer per 8 KB page → fewer buffer-cache hits → more I/O per query.
- Net: same data, ~2× the disk and worse cache behavior.
Worked-example solution. Pick the tightest correct type:
CREATE TABLE events (
event_id BIGINT PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
user_id BIGINT NOT NULL -- not TEXT
);
Rule of thumb: if a value is a count or an internal identifier, it is an integer; reach for TEXT only when the value is a real human-readable string.
Equality and comparison semantics
The comparison invariant: PostgreSQL compares values within a type cleanly, but mixing types forces an implicit cast that can produce surprises — string '10' compares lexicographically ('10' < '2'), numeric 10 compares mathematically (10 > 2), and timestamps compare instant-to-instant only if both sides are TIMESTAMPTZ. The right type makes <, =, and BETWEEN behave the way humans expect.
-
'10' < '2'isTRUEwhen both areTEXT— string compare reads left-to-right. -
10 < 2isFALSEwhen both areINTEGER— numeric compare. -
TIMESTAMPvsTIMESTAMPTZ— PostgreSQL will compare them only after coercing one side; the answer depends on the session time zone. -
Collations on
TEXT—'abc' = 'ABC'isFALSEwith the defaultCcollation, possiblyTRUEwith a case-insensitive collation.
Worked example. A 5-row table where the comparison flips based on whether score is TEXT or INTEGER.
| score (as TEXT) | order | score (as INT) | order |
|---|---|---|---|
| "10" | 1 | 10 | 3 |
| "2" | 2 | 2 | 2 |
| "100" | 3 | 100 | 4 |
| "9" | 4 | 9 | 1 |
Step-by-step.
- Stored as
TEXT:ORDER BY scorecompares character-by-character;'1'(0x31) sorts before'9'(0x39), so'100'sorts before'2'. - Stored as
INTEGER:ORDER BY scorecompares the numeric value;2 < 9 < 10 < 100— the human-expected order. - The query is identical in both cases; only the column type changed the answer.
- The bug is invisible until someone audits the leaderboard and notices
"100"ranked above"9".
Worked-example solution. Always store ordinal-comparable values in a numeric type:
CREATE TABLE leaderboard (
player_id BIGINT PRIMARY KEY,
score INTEGER NOT NULL CHECK (score >= 0)
);
SELECT player_id, score FROM leaderboard ORDER BY score DESC;
Rule of thumb: if you ever compare values with <, >, or BETWEEN, the type must support those operators natively — never rely on string sort for numbers or dates.
Index operator classes and planner statistics
The index invariant: a B-tree index is built against an operator class tied to a specific type; when a query compares an indexed column to a value of a different type, the planner usually has to scan instead of seek, because the function it applies to your value (the implicit cast) isn't immutable on the indexed expression. The right type matches the index; the wrong type silently disables it.
-
CREATE INDEX … ON t (col)— default B-tree, uses the type's default operator class. -
col = $1with matching type — index seek. -
col = $1::other_type— index seek when the cast is on the literal side. -
col::other_type = $1— sequential scan; you cast the column, not the value.
Worked example. A user_id BIGINT column with a B-tree index, queried two ways.
| predicate | plan | rows scanned |
|---|---|---|
WHERE user_id = 42 |
Index Scan | ~1 |
WHERE user_id = '42' |
Index Scan (literal cast) | ~1 |
WHERE user_id::text = '42' |
Seq Scan | full table |
Step-by-step.
-
WHERE user_id = 42— both sides areBIGINT; planner uses the B-tree directly. -
WHERE user_id = '42'— PostgreSQL coerces the string literal'42'toBIGINT(sinceBIGINTis the indexed side); index still usable. -
WHERE user_id::text = '42'— the cast is on the column; PostgreSQL would have to apply the::textfunction to every row to compare; the B-tree onuser_idcannot help. - The third predicate triggers a full sequential scan even though an index "exists on
user_id." - Diagnosis is an
EXPLAINaway:Seq Scan on … Filter: ((user_id)::text = '42'::text)is the giveaway.
Worked-example solution. Keep casts on the literal side:
-- good: cast literal, index used
SELECT * FROM events WHERE user_id = '42'; -- literal '42' coerced to BIGINT
-- bad: cast column, index killed
SELECT * FROM events WHERE user_id::text = '42';
Rule of thumb: if you see a :: on a column inside a WHERE or JOIN, expect a seq scan and ask whether the underlying type should change.
Common beginner mistakes
- Declaring every text column as
VARCHAR(255)"just in case" — wastes nothing on storage but lies in the schema about the real constraint. - Storing numeric IDs as
TEXTbecause the source CSV had quotes — every downstream comparison and index becomes a hazard. - Mixing
TIMESTAMPandTIMESTAMPTZin joins — comparison depends on the session time zone; you have written a query that returns different rows for different users. - Treating implicit coercion as free — the planner often hides the cost behind a seq scan and an unbroken
EXPLAINsummary. - Skipping
CHECKconstraints because "the application handles it" — types and constraints together are the only durable schema.
SQL Interview Question on Picking Types for an Orders Schema
A junior teammate sends a CREATE TABLE orders script: order_id VARCHAR(255), total FLOAT, customer_id TEXT, placed_at TIMESTAMP. The orders application is global, has ~5 M orders per day, and is joined daily to dim_customer (customer_id BIGINT, …). Identify every type-level risk in this schema and rewrite it so reports stay correct, joins stay indexed, and storage doesn't bloat.
Solution Using Tight Native Types + NUMERIC + TIMESTAMPTZ + CHECK Constraints
Code solution.
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES dim_customer(customer_id),
total NUMERIC(14,2) NOT NULL CHECK (total >= 0),
placed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON orders (customer_id);
CREATE INDEX ON orders (placed_at);
Step-by-step trace of the four problems:
| original type | risk | fix |
|---|---|---|
order_id VARCHAR(255) |
lexicographic sort; wide rows; index mismatch |
BIGSERIAL / BIGINT
|
total FLOAT |
binary rounding (0.1 + 0.2 ≠ 0.3); aggregates drift | NUMERIC(14,2) |
customer_id TEXT |
cross-type join with dim_customer.customer_id BIGINT; seq scan |
BIGINT + FK |
placed_at TIMESTAMP |
wall-clock semantics; reports differ per session TZ | TIMESTAMPTZ |
Output: a typed, constrained schema. The daily customer-join now uses a B-tree seek on customer_id; revenue rollups are exact to the cent; "orders placed today" is unambiguous regardless of the analyst's session time zone.
Why this works — concept by concept:
-
BIGSERIALPK — monotonic, 8-byte integer; supports range scans, packs tight, and matches every downstream join. -
BIGINT customer_idwith FK — joins are type-identical, the index is usable, and orphan rows are rejected at write time. -
NUMERIC(14, 2)for money — exact decimal arithmetic; aggregates over millions of rows produce the same total a calculator would. -
TIMESTAMPTZfor placed_at — every value is stored as a UTC instant; display converts to the session TZ; reports never silently shift by 24 h after a deploy. -
CHECK (total >= 0)— durable invariant; even a buggy ETL run cannot insert negative revenue. -
Cost—O(1)extra bytes per row vs the original; massiveO(log N)per join savings vs the seq-scan caused by the type mismatch.
Inline CTA: drill the SQL practice page for type-fluency reps and the aggregation topic for grain-correct rollups.
SQL
Topic — SQL
SQL practice problems
SQL
Topic — aggregations
SQL aggregation problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
2. Numeric types
Integers for counts, NUMERIC for money, FLOAT for measurements
PostgreSQL splits numeric types into three families: exact integers (SMALLINT, INTEGER, BIGINT), arbitrary-precision exact decimals (NUMERIC(p, s) / DECIMAL), and binary floating point (REAL, DOUBLE PRECISION). The choice is rarely about precision in the abstract — it's about which arithmetic errors are acceptable. Integers never lose precision; NUMERIC is exact at a fixed scale; floats trade precision for speed and are the wrong default for currency.
Pro tip: When asked "what type is
revenue?", sayNUMERIC(p, s)and namepandsout loud —NUMERIC(14, 2)for cents up to ~$100 B,NUMERIC(18, 4)for FX rates and basis points. Knowing the scale is what separates "I know decimals exist" from "I have shipped a ledger."
INTEGER / BIGINT — surrogate keys and counts
The integer invariant: INTEGER is 4 bytes (range ±2.1 B) and BIGINT is 8 bytes (range ±9.2 quintillion); use INTEGER for small/medium counts and BIGINT for surrogate keys, monotonically increasing IDs, and anything that might ever cross 2 billion. Overflow is silent in some languages but is a hard error in PostgreSQL — once the column wraps, every insert fails.
-
SMALLINT— 2 bytes; rarely used outside tightly packed enum-like values. -
INTEGER— 4 bytes; default for row counts, scores, age, quantities. -
BIGINT— 8 bytes; default for primary keys on growing tables. -
BIGSERIAL/GENERATED AS IDENTITY— 8-byte auto-incrementing PK.
Worked example. An events table grows from 1 M to 3 B rows.
| year | events | INTEGER PK? | BIGINT PK? |
|---|---|---|---|
| 2024 | 1 M | ✓ | ✓ |
| 2025 | 500 M | ✓ | ✓ |
| 2026 | 2.5 B | ✗ overflow | ✓ |
Step-by-step.
- Start with
event_id INTEGER— fits 2.1 B values. - Daily growth at 5 M / day reaches 2.1 B by mid-2026.
- Next
INSERTfails:ERROR: integer out of range. - Migration requires
ALTER TABLE … ALTER COLUMN event_id TYPE BIGINT;— rewrites the entire table; locks scale with table size. - Doing this at 2.1 B rows means hours of downtime; doing it at table creation is free.
Worked-example solution. Use BIGINT for any growing PK:
CREATE TABLE events (
event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL,
ts TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Rule of thumb: every primary key on a table that "might be big someday" is BIGINT from day one. The 4 extra bytes per row is the cheapest insurance you can buy.
NUMERIC(p, s) — exact decimal for currency
The decimal invariant: NUMERIC(p, s) stores p total digits with s of them after the decimal point; arithmetic is exact at that scale; SUM(NUMERIC) over millions of rows produces the byte-identical result a careful accountant would compute by hand. The cost is performance — NUMERIC math is slower than integer or float — but for currency the trade-off is settled: exact wins.
-
NUMERIC(14, 2)— up to 12 digits before the decimal, 2 after; ~$1 T. -
NUMERIC(18, 4)— FX rates, fractional cents (interest, allocations). -
NUMERIC(38, 6)— analytics-warehouse scale; matches Snowflake / BigQuery default. - Storage — ~2 bytes overhead + 2 bytes per 4 digits; cheap up to ~$1 T.
Worked example. Summing 1,000 invoice lines of $0.10 each:
| storage type | SUM(amount) |
|---|---|
DOUBLE PRECISION |
100.00000000000007 |
NUMERIC(14, 2) |
100.00 |
Step-by-step.
-
0.1cannot be represented exactly in binary floating point; the stored value is0.1000000000000000055511…. - Adding 1,000 of these in
DOUBLE PRECISIONaccumulates1000 * tiny_error; the result drifts. -
NUMERIC(14, 2)stores0.10literally and adds with decimal arithmetic; 1,000 ×0.10is exactly100.00. - The float error is invisible until a finance lead notices a $0.00000007 discrepancy on a reconciliation report.
- Once the column type is
NUMERIC, the drift is impossible by construction.
Worked-example solution. Currency columns always use NUMERIC:
CREATE TABLE invoice_lines (
line_id BIGSERIAL PRIMARY KEY,
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price NUMERIC(12, 4) NOT NULL,
line_total NUMERIC(14, 4) GENERATED ALWAYS AS (quantity * unit_price) STORED
);
Rule of thumb: anything that touches money, tax, allocations, basis points, or a regulated ledger is NUMERIC(p, s) — never FLOAT or DOUBLE PRECISION.
REAL / DOUBLE PRECISION — binary floating point and rounding
The float invariant: REAL (4 bytes, ~7 decimal digits) and DOUBLE PRECISION (8 bytes, ~15 digits) follow IEEE 754; they're fast and compact but inexact at decimal fractions; their natural home is measurements where the underlying quantity is itself approximate (sensor reading, ML feature, scientific magnitude). Floats are not "lossy currency" — they are the right type for things that were never exact to begin with.
-
REAL— 4 bytes; ~7 decimal digits of precision. -
DOUBLE PRECISION— 8 bytes; ~15 digits; PostgreSQL's defaultFLOAT. -
0.1 + 0.2 = 0.30000000000000004in both. - Use cases — physical measurements, geographic coordinates, ML scores, neural-net outputs.
Worked example. Same 5 sensor readings stored two ways:
| reading | REAL |
DOUBLE PRECISION |
|---|---|---|
| 23.7 | 23.7 | 23.7 |
| 0.1 + 0.2 | 0.3 (~0.30000001) | 0.30000000000000004 |
| 3.14159265358979 | 3.1415927 | 3.141592653589793 |
Step-by-step.
-
REALrounds aggressively after ~7 digits; fine for a temperature gauge, wrong for a price. -
DOUBLE PRECISIONkeeps ~15 digits — enough for almost any measurement. - Neither stores
0.1 + 0.2as exactly0.3because base-2 cannot represent base-10 tenths. - Equality (
=) on floats is unsafe; use a tolerance (abs(a - b) < 1e-9) for "approximately equal." - For currency, both are wrong — use
NUMERIC.
Worked-example solution. Use floats for genuinely approximate measurements:
CREATE TABLE sensor_readings (
reading_id BIGSERIAL PRIMARY KEY,
device_id BIGINT NOT NULL,
temp_celsius DOUBLE PRECISION NOT NULL,
ts TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Rule of thumb: if you would compare the value with = and care about the result, it is not a float.
Common beginner mistakes
- Defaulting all PKs to
SERIAL(32-bit) and discovering the overflow in production years later. - Storing money in
DOUBLE PRECISIONbecauseNUMERIC"is slow" — the slowdown is invisible to humans; the rounding is not. - Using
NUMERICwith no precision (NUMERICwith no(p,s)) — works but skips the documentation value of stating the scale. - Comparing floats with
=instead of a tolerance window. - Using
INTEGERfor cents (total_cents) instead ofNUMERIC(14, 2)— works but burdens every read with a/100.0.
SQL Interview Question on Reconciling a Drifting Invoice Total
The CFO reports that the monthly invoice total in the dashboard disagrees with the source-of-truth ledger by $0.0000034 on average. The dashboard sums an invoice_lines.amount column declared as DOUBLE PRECISION. Identify the cause and propose a schema fix that makes the totals byte-identical to the ledger from now on.
Solution Using NUMERIC(14, 4) + a Generated line_total Column
Code solution.
ALTER TABLE invoice_lines
ALTER COLUMN amount TYPE NUMERIC(14, 4) USING amount::NUMERIC(14, 4);
ALTER TABLE invoice_lines
ADD COLUMN line_total NUMERIC(14, 4)
GENERATED ALWAYS AS (quantity * amount) STORED;
-- nightly reconciliation
SELECT SUM(line_total) AS dash_total
FROM invoice_lines
WHERE invoice_date = DATE '2026-04-13';
Step-by-step trace of the drift:
| step | value | running sum (DOUBLE PRECISION) | running sum (NUMERIC) |
|---|---|---|---|
| 1 | 0.1 | 0.1 | 0.10 |
| 2 | 0.1 | 0.2 | 0.20 |
| 3 | 0.1 | 0.30000000000000004 | 0.30 |
| … | … | accumulating error | exact |
| 1000 | 0.1 | 100.00000000000007 | 100.00 |
Output: dashboard total per day now matches the ledger to the cent (or to the basis point, given scale 4). No silent drift; finance closes the books without manual adjustment.
Why this works — concept by concept:
-
NUMERIC(14, 4)exact decimal arithmetic — every addition stays exact at four decimal places; no IEEE 754 representation error. -
Generated
line_totalcolumn — eliminates a class of bugs where the application computesqty * priceand the database computes a slightly different number. -
STOREDnotVIRTUAL— value is materialised once at write time; reads are plain column reads with no per-row recomputation. -
Tolerance check on the ETL side — even with
NUMERIC, reconciliation should compare against the source-of-truth ledger with a 0-tolerance gate. -
One-time
ALTER TABLE … USING— converts existing rows in place; from then on the type system makes drift impossible. -
Cost— single rewrite at migration; per-rowNUMERICmath is ~3× slower thanDOUBLE PRECISIONbut invisible compared to network and disk costs.
Inline CTA: for the structured currency-and-aggregation path see SQL for Data Engineering Interviews — From Zero to FAANG.
SQL
Topic — aggregations
SQL aggregation problems
SQL
Topic — conditional aggregation
Conditional-aggregation problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
3. Text and binary
CHAR vs VARCHAR vs TEXT, collations, and BYTEA
PostgreSQL has three character types — CHAR(n), VARCHAR(n), and TEXT — and one binary type, BYTEA. The decision rule is short: use TEXT unless you have a hard reason to enforce a length cap, and store files outside the database with a URL or object-store key in the column. Most "text" bugs are not about storage at all — they are about collations, which control how text compares and sorts.
Pro tip: Two strings that look identical can compare unequal under a different collation. When a join "returns no rows" on string keys, your first check after
EXPLAINisSHOW lc_collate;andSELECT pg_collation_for(col1)on both columns.
CHAR vs VARCHAR vs TEXT — pick TEXT unless you need fixed-width
The text invariant: TEXT and VARCHAR(n) have the same on-disk representation in PostgreSQL — no padding, no length penalty; the only difference is the (n) constraint that throws an error on overflow. CHAR(n) pads with spaces to length, costing both storage and surprise (trailing-space equality is mostly stripped on read, but joins can still misbehave).
-
CHAR(n)— fixed-width; pads with spaces; storage =nbytes (plus a length header). -
VARCHAR(n)— variable-width; rejects values longer thann. -
TEXT— variable-width; no length limit (up to 1 GB). -
citextextension — case-insensitive text via thecitexttype.
Worked example. Storing "abc" three ways:
| type | stored bytes | trailing pad |
|---|---|---|
CHAR(5) |
abc (5 bytes) |
yes |
VARCHAR(5) |
abc (3 bytes) |
no |
TEXT |
abc (3 bytes) |
no |
Step-by-step.
-
CHAR(5)stores literallyabc(5 chars), padding to length. -
VARCHAR(5)storesabc; would rejectabcdefwith a length-violation error. -
TEXTstoresabc; would acceptabcdef. - Equality semantics differ:
CHAR(5) 'abc' = VARCHAR(5) 'abc'may beTRUEbut joining aCHARcolumn to aVARCHARcolumn from another table can still fail when one side preserved trailing whitespace. - Default to
TEXT— it is the simplest and never accumulates these padding surprises.
Worked-example solution. Schema for a free-form bio field:
CREATE TABLE profiles (
user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
bio TEXT NOT NULL DEFAULT ''
);
Rule of thumb: use VARCHAR(n) only when you genuinely want the database to enforce a maximum length (e.g., regulator-imposed description VARCHAR(280)); otherwise reach for TEXT.
Collations and locale-aware equality
The collation invariant: a collation is a tuple of (alphabet, sort order, case-sensitivity, accent-sensitivity) that the database applies to every text comparison; the default is usually "C" (binary) or the OS locale; case-insensitive matching requires either an explicit ICU collation or the citext extension. Two databases with different locales can disagree on whether 'café' = 'cafe'.
-
Ccollation — byte-by-byte; fastest; case- and accent-sensitive. -
en_US.UTF-8— locale-aware; sorts'a' < 'B' < 'c'(case-insensitive primary). -
und-x-icu— ICU root locale; consistent across platforms. -
citext— case-insensitive text type;'ABC' = 'abc'isTRUEautomatically.
Worked example. Joining users by email under different collations:
| left email | right email | join match (C) | join match (citext) |
|---|---|---|---|
alice@x.com |
alice@x.com |
✓ | ✓ |
Alice@X.com |
alice@x.com |
✗ | ✓ |
alice@x.com |
alice@x.com |
✗ | ✗ (whitespace, not case) |
Step-by-step.
- Default
Ccollation does a byte compare;'A'(0x41) is not equal to'a'(0x61). - Same string with mixed case fails to join in
Ceven though humans see them as the same email. - Switching the column type to
citextmakes the database compare case-insensitively, and the second row matches. - Whitespace differences still cause mismatches —
citextdoes not trim; that requiresBTRIM(col)in ETL. - Pick one normalization rule (lowercase + trim at write time) and apply it consistently rather than relying on collation alone.
Worked-example solution. Use citext for emails and usernames:
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
email CITEXT NOT NULL UNIQUE
);
Rule of thumb: if you ever want 'Foo' = 'foo' to be TRUE, set that contract at the column type, not at every LOWER(...) call site.
BYTEA for binary blobs vs URL-in-SQL for files
The binary invariant: BYTEA stores raw bytes (hashes, signatures, compressed payloads, small binary tokens); large blobs (images, PDFs, ML model weights) belong in object storage (S3, GCS) with a TEXT URL or key in SQL. Databases are not file systems — every byte stored in BYTEA slows backups, replication, and query cache.
-
BYTEA— variable-length binary; up to 1 GB but typically used for ≤ 10 KB tokens. -
SHA-256hash — 32 bytes; perfectBYTEAuse case. -
Large files — store in S3; keep
s3_key TEXTin SQL. -
pg_largeobject— legacy API; rarely worth the complexity vs object storage.
Worked example. A documents table with two design choices:
| design | per-row storage | backup time |
|---|---|---|
body BYTEA (10 MB PDFs, 1 M rows) |
10 TB in pg_largeobject
|
hours |
s3_key TEXT (URL only, 1 M rows) |
< 100 MB | seconds |
Step-by-step.
- Storing 10 MB PDFs in
BYTEAputs all bytes in TOAST; the table grows to 10 TB. - Every
pg_dumpreads all 10 TB; backups become days, not minutes. - Replication lag grows; HA failover slows.
- Object storage (S3) is purpose-built for large files; the database keeps only a 50-byte
s3_key. - Reads still feel "one query" — the application fetches the URL from SQL, then streams the file from S3.
Worked-example solution. Store files externally; keep the key in SQL:
CREATE TABLE documents (
document_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
sha256 BYTEA NOT NULL CHECK (octet_length(sha256) = 32),
s3_key TEXT NOT NULL,
uploaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Rule of thumb: the rough threshold is 100 KB — anything above that belongs in object storage; anything below is fine as BYTEA.
Common beginner mistakes
- Declaring text columns as
VARCHAR(255)everywhere — Java legacy from the MySQL world; meaningless in modern PostgreSQL. - Using
CHAR(n)and being surprised that'abc' = 'abc 'returnsFALSEin some join contexts. - Storing emails as case-sensitive
TEXTand writingLOWER(email) = LOWER($1)everywhere — setcitextonce at the column. - Putting megabyte payloads in
BYTEAand discovering the cost only whenpg_dumpruns for six hours. - Forgetting to trim whitespace at ingest —
' alice@x.com'and'alice@x.com'are different strings to the database.
SQL Interview Question on Reconciling Case-Sensitive Email Joins
A signup flow stores users.email as TEXT. The marketing dashboard joins events.email (also TEXT) to users.email to count signed-up users. Roughly 8% of events fail to match even though the user definitely signed up. Diagnose the cause and propose a column-level fix that prevents recurrence.
Solution Using citext + Normalised Write-Time Email
Code solution.
CREATE EXTENSION IF NOT EXISTS citext;
ALTER TABLE users ALTER COLUMN email TYPE CITEXT USING LOWER(BTRIM(email));
ALTER TABLE events ALTER COLUMN email TYPE CITEXT USING LOWER(BTRIM(email));
-- joins now match regardless of case; rejoin to verify
SELECT COUNT(*) FROM events e
JOIN users u ON u.email = e.email;
Step-by-step trace of the 8% miss:
| event email | user email | TEXT join | CITEXT join |
|---|---|---|---|
Alice@x.com |
alice@x.com |
✗ | ✓ |
bob@x.com |
bob@x.com |
✓ | ✓ |
carol@x.com |
carol@x.com |
✗ | ✗ (whitespace) |
Output: the case-sensitivity portion of the miss disappears (≈ 7%); the remaining ≈ 1% is whitespace, fixed by BTRIM in the USING clause at migration and a BEFORE INSERT trigger going forward.
Why this works — concept by concept:
-
CITEXTcolumns — case-insensitive by construction; downstream queries never have to wrapLOWER(...)and indexes still work. -
LOWER(BTRIM(email))in theUSINGclause — one-shot normalisation of existing rows during the type change. -
Trigger or
CHECKenforcement going forward — keeps future inserts canonical. -
No more
LOWER(...)at every query site — every analyst joins safely without remembering the casing rule. -
Existing indexes rebuild automatically —
ALTER COLUMN TYPErebuilds the index against the new operator class. -
Cost— one rewrite at migration; per-row equality cost identical toTEXT.
Inline CTA: for the string-fluency syllabus see SQL for Data Engineering Interviews — From Zero to FAANG.
SQL
Topic — string manipulation
String-manipulation problems
SQL
Topic — joins
SQL join problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
4. Boolean and NULL
Three-valued logic and the WHERE flag trap
PostgreSQL has a real BOOLEAN type with three values: TRUE, FALSE, and NULL. The third value is the source of nearly every "where did my rows go?" bug — NULL is not false; it is unknown. Filters like WHERE flag silently exclude NULL rows, and WHERE NOT flag excludes them too, so a "true-or-not-true" pair of queries can together miss rows entirely.
Pro tip: Whenever you write a boolean predicate, name the third bucket out loud. "Active users are
is_active = TRUE; bots areis_bot = TRUE; unknown isIS NULLand goes into the needs-investigation drawer." That habit catches the silent-exclusion bug before it ships.
BOOLEAN literals, IS TRUE / IS FALSE / IS NULL
The boolean invariant: WHERE flag returns rows where the predicate is TRUE; rows where flag is NULL (unknown) are also excluded; to include or exclude them deliberately you must use IS NULL / IS NOT NULL / IS DISTINCT FROM. Standard SQL three-valued logic treats NULL = anything as NULL, which is neither true nor false — and a WHERE clause keeps only rows that evaluate to TRUE.
-
TRUE/FALSE— the two non-null boolean values. -
NULL— unknown; not equal to anything (including itself). -
IS TRUE/IS FALSE— three-valued aware; never returns NULL. -
IS DISTINCT FROM— treats two NULLs as equal; useful for join keys.
Worked example. A 5-row events table with a nullable is_bot flag:
| event_id | is_bot |
|---|---|
| 1 | TRUE |
| 2 | FALSE |
| 3 | NULL |
| 4 | TRUE |
| 5 | NULL |
| predicate | rows kept |
|---|---|
WHERE is_bot |
1, 4 (only TRUE) |
WHERE NOT is_bot |
2 (only FALSE) |
WHERE is_bot IS NOT TRUE |
2, 3, 5 |
WHERE is_bot IS NULL |
3, 5 |
Step-by-step.
-
WHERE is_botkeeps rows where the predicate isTRUE; rows 3 and 5 (NULL) are silently dropped. -
WHERE NOT is_botkeeps rows where the predicate evaluates toTRUE;NOT NULLisNULL, so rows 3 and 5 are still silently dropped. - The dashboard "Bots vs non-bots" pair (
is_bottrue /NOT is_bot) sums to 3 rows, not 5 — two rows are missing in plain sight. -
IS NOT TRUEis three-valued aware: it returnsTRUEfor rows 2, 3, 5 — both the false ones and the nulls. - Pick the form that matches your intent and audit any dashboard that splits a column on a boolean.
Worked-example solution. Three-valued-aware predicates:
-- bots
SELECT COUNT(*) FROM events WHERE is_bot IS TRUE;
-- non-bots, including unknown
SELECT COUNT(*) FROM events WHERE is_bot IS NOT TRUE;
-- only unknown
SELECT COUNT(*) FROM events WHERE is_bot IS NULL;
Rule of thumb: never write WHERE flag or WHERE NOT flag on a nullable boolean column without consciously deciding what NULL means.
NOT col vs col = FALSE with NULLs
The negation invariant: col = FALSE and NOT col are logically the same when col is TRUE or FALSE, but both evaluate to NULL when col IS NULL — and a WHERE clause keeps only TRUE, so both forms silently drop nulls. The fix is COALESCE(col, FALSE) or IS NOT TRUE, which collapse NULL into a definite answer.
-
WHERE col = FALSE— keeps rows wherecolis literallyFALSE. -
WHERE NOT col— same; both drop NULL rows. -
WHERE COALESCE(col, FALSE) = FALSE— treats NULL as FALSE; keeps both. -
WHERE col IS NOT TRUE— treats NULL as not-true; keeps both.
Worked example. Same events table; analyst writes "all non-bot events":
| query | rows | comment |
|---|---|---|
WHERE is_bot = FALSE |
1 | row 2 only — silent miss |
WHERE NOT is_bot |
1 | identical; same bug |
WHERE is_bot IS NOT TRUE |
3 | rows 2, 3, 5 — correct |
WHERE COALESCE(is_bot, FALSE) = FALSE |
3 | also correct |
Step-by-step.
- Marketing asks "how many non-bot events?"; analyst writes
WHERE NOT is_bot. - Result is 1; marketing thinks bots account for 4 of 5 events.
- A second analyst writes
WHERE is_bot IS NOT TRUEand gets 3; the difference is the NULL rows. - The dashboard's "bot vs non-bot" pie chart silently undercounts by 40%.
- The fix is either a
COALESCEat query time or aNOT NULL DEFAULT FALSEconstraint at schema time — both make the NULL case explicit.
Worked-example solution. Default boolean columns to a known value at write time:
ALTER TABLE events
ALTER COLUMN is_bot SET DEFAULT FALSE,
ALTER COLUMN is_bot SET NOT NULL;
-- queries are now safe
SELECT COUNT(*) FROM events WHERE NOT is_bot;
Rule of thumb: if a boolean has no "unknown" business meaning, declare it NOT NULL DEFAULT FALSE and remove the third bucket entirely.
COALESCE and explicit NULL handling
The COALESCE invariant: COALESCE(a, b, c) returns the first non-NULL argument; it is the simplest way to replace NULL with a default in WHERE, ORDER BY, and aggregations — but use it deliberately, because hiding NULL is the same as throwing away information. The right pattern is to decide whether NULL means "no answer" or "definitely false," then code that intent.
-
COALESCE(col, default)— first non-NULL argument. -
NULLIF(a, b)— returns NULL whena = b; useful for "treat empty string as NULL." -
a IS DISTINCT FROM b—TRUEwhen values differ, treating NULL as a real value. -
SUM(col)— ignores NULLs;COUNT(col)ignores NULLs;COUNT(*)includes them.
Worked example. Summing score where some rows are NULL:
| row | score |
|---|---|
| 1 | 10 |
| 2 | NULL |
| 3 | 20 |
| expression | result |
|---|---|
SUM(score) |
30 |
SUM(COALESCE(score, 0)) |
30 |
AVG(score) |
15 (n=2) |
AVG(COALESCE(score, 0)) |
10 (n=3) |
Step-by-step.
-
SUMignores NULLs by SQL convention; you get the same answer with or withoutCOALESCE. -
AVGdivides byCOUNT(non-NULL); ignoring NULL gives 15, treating NULL as zero gives 10. - The "right" answer depends on what NULL means — missing measurement (use 15) vs zero score (use 10).
- Always make the choice explicit; do not let a downstream consumer guess.
-
IS DISTINCT FROMis the safe way to compare keys that may be NULL:a IS DISTINCT FROM bisTRUEwhen one is NULL and the other is not.
Worked-example solution. Choose the aggregation rule that matches the business question:
-- "average of measurements we have"
SELECT AVG(score) FROM responses;
-- "average where missing means zero"
SELECT AVG(COALESCE(score, 0)) FROM responses;
Rule of thumb: every COALESCE should answer the question "what should the missing row contribute?" in one sentence — if you cannot answer, do not coalesce.
Common beginner mistakes
- Writing
WHERE flag = FALSEand assuming it includes NULL rows. - Pairing
WHERE flagwithWHERE NOT flagand expecting the row counts to sum to the table size. - Storing booleans as
'Y'/'N'strings — every comparison becomes aLOWER(...)hazard; use realBOOLEAN. - Forgetting that
NULL = NULLisNULL, notTRUE— join keys with NULL needIS DISTINCT FROMor pre-coalesced values. - Using
AVGover a nullable column without deciding whether missing means zero or excluded.
SQL Interview Question on a Dashboard Missing 12% of Rows
A events.is_bot BOOLEAN column is nullable. The dashboard splits "bots vs humans" with WHERE is_bot and WHERE NOT is_bot. The two row counts sum to 88% of the table; nobody can explain where the missing 12% went. Identify the cause and produce a single query pair that correctly partitions every row.
Solution Using IS TRUE / IS NOT TRUE + a Schema-Level NOT NULL Fix
Code solution.
-- short-term query-side fix
SELECT
COUNT(*) FILTER (WHERE is_bot IS TRUE) AS bots,
COUNT(*) FILTER (WHERE is_bot IS NOT TRUE) AS humans_or_unknown
FROM events;
-- long-term schema fix
UPDATE events SET is_bot = FALSE WHERE is_bot IS NULL;
ALTER TABLE events
ALTER COLUMN is_bot SET NOT NULL,
ALTER COLUMN is_bot SET DEFAULT FALSE;
Step-by-step trace.
| step | predicate | rows |
|---|---|---|
| 1 |
WHERE is_bot (old) |
12,000 |
| 2 |
WHERE NOT is_bot (old) |
76,000 |
| 3 | sum | 88,000 of 100,000 |
| 4 | missing | 12,000 rows where is_bot IS NULL
|
| 5 | WHERE is_bot IS NOT TRUE |
88,000 — both FALSE and NULL |
| 6 | bots + humans_or_unknown |
100,000 ✓ |
Output: the two-bucket dashboard sums to 100% of rows. Schema-level NOT NULL DEFAULT FALSE makes future regression impossible.
Why this works — concept by concept:
-
IS TRUE/IS NOT TRUEare three-valued safe — they never return NULL; theWHEREclause keeps exactly the rows the analyst expects. -
COUNT(*) FILTER (WHERE …)— single-pass two-bucket aggregation; faster than running two queries. -
UPDATE … WHERE IS NULL+SET NOT NULL— one-shot remediation of historical NULLs. -
DEFAULT FALSE— guarantees new rows start in a definite state. - No surprise on rerun — the dashboard's "missing 12%" cannot reappear because the column type now rules it out.
-
Cost— oneUPDATE; theFILTERform has the same cost as two separateCOUNTs combined into one scan.
Inline CTA: for the safe-NULL drill set see SQL practice page.
SQL
Topic — filtering
SQL filtering problems
SQL
Topic — conditional aggregation
Conditional-aggregation problems
SQL
Language — SQL
All SQL practice problems
5. Date and time
DATE, TIME, TIMESTAMP, and TIMESTAMPTZ — instants vs wall clocks
PostgreSQL splits time into calendar dates (DATE), local wall-clock times (TIME), wall-clock timestamps (TIMESTAMP WITHOUT TIME ZONE), and absolute instants (TIMESTAMP WITH TIME ZONE, abbreviated TIMESTAMPTZ). The two-row mental model: TIMESTAMP is what a wall clock reads at a particular spot; TIMESTAMPTZ is a point on the global timeline. Every cross-region bug comes from picking the first when you wanted the second.
Pro tip: Default every event-instant column to
TIMESTAMPTZand useTIMESTAMPonly when the time is intentionally local (a "9:00 AM recurring meeting" in the user's locale). Reporting that crosses regions becomes obviously correct or obviously wrong, with no middle ground.
TIMESTAMP without time zone — local wall-clock semantics
The wall-clock invariant: TIMESTAMP stores the literal datetime you gave it with no time-zone metadata; "2026-04-13 09:00:00" means 9:00 local wherever you happen to be; comparing two TIMESTAMP values is correct only if both came from the same time zone. It is the right type for "9:00 morning meeting in the user's local time" — and the wrong type for "the moment the user clicked Pay."
-
TIMESTAMPstorage — 8 bytes; no time-zone info. -
NOW()returnsTIMESTAMPTZ— coerced toTIMESTAMPstrips the zone. -
Comparison — two
TIMESTAMPs compare by literal value, regardless of zones. - Use case — "every Monday at 09:00 local time" recurring schedules.
Worked example. Storing a 09:00 morning meeting for two users in different zones:
| user | wall-clock time | TIMESTAMP value |
|---|---|---|
| Alice (NYC) | 9:00 AM EDT | 2026-04-13 09:00:00 |
| Bob (Tokyo) | 9:00 AM JST | 2026-04-13 09:00:00 |
Step-by-step.
- Both rows look identical because the type carries no zone — the database just stores the digits the application sent.
- Both meetings happen at "9:00 AM local"; they are not the same UTC instant (13 hours apart).
- A query like
SELECT * WHERE start_at = '2026-04-13 09:00:00'returns both rows; that is the right answer for a "9 AM morning meetings" report. - If the same column had been
TIMESTAMPTZ, the two values would have been stored as different UTC instants and the report would have returned one of them or neither, depending on session settings. - Pick
TIMESTAMPonly when the wall-clock semantics are the actual business rule.
Worked-example solution. Recurring local-time schedule:
CREATE TABLE recurring_meetings (
meeting_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
local_tz TEXT NOT NULL, -- 'America/New_York'
start_at TIMESTAMP NOT NULL -- intentional wall-clock
);
Rule of thumb: if your column answers the question "what should the clock on the wall read?", use TIMESTAMP; otherwise use TIMESTAMPTZ.
TIMESTAMPTZ — UTC instant, session display
The instant invariant: TIMESTAMPTZ stores every value as a UTC instant internally (8 bytes), regardless of the time-zone literal in the INSERT; output is converted to the session's TimeZone at read time; comparison is always instant-to-instant. Same data ships to every region and every report agrees on "when did this happen."
-
TIMESTAMPTZstorage — 8 bytes; internal representation is UTC. -
INSERT … TIMESTAMPTZ '2026-04-13 09:00 EDT'— stored as 13:00 UTC. -
SET TimeZone = 'Asia/Tokyo'thenSELECT ts— outputs2026-04-13 22:00:00+09. -
AT TIME ZONE— converts between zones in a query.
Worked example. Same UTC instant viewed from three zones:
| session TimeZone | what SELECT ts FROM events WHERE id = 1 shows |
|---|---|
UTC |
2026-04-13 13:00:00+00 |
America/New_York |
2026-04-13 09:00:00-04 |
Asia/Tokyo |
2026-04-13 22:00:00+09 |
Step-by-step.
- The instant
2026-04-13 13:00:00 UTCwas inserted once into the table. - The on-disk representation is a single 8-byte number — UTC microseconds since the epoch.
- Each session reads the same row, but the display function converts that instant to the session's
TimeZone. - The underlying data is identical; the rendering differs.
- Cross-region reports stay correct because every comparison happens on the stored UTC value, not the displayed string.
Worked-example solution. Event-instant column:
CREATE TABLE clicks (
click_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
ts TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
SELECT user_id, COUNT(*) FROM clicks
WHERE ts >= NOW() - INTERVAL '24 hours'
GROUP BY user_id;
Rule of thumb: every "when did the event happen?" column is TIMESTAMPTZ; never TIMESTAMP.
AT TIME ZONE conversions and DATE_TRUNC pitfalls
The conversion invariant: ts AT TIME ZONE 'America/New_York' converts a TIMESTAMPTZ to a wall-clock TIMESTAMP in that zone, and the reverse (TIMESTAMP AT TIME ZONE 'America/New_York') interprets the wall-clock as a UTC instant; DATE_TRUNC('day', ts) buckets by UTC midnight unless you convert first. The pattern for "daily count in the user's local time" is DATE_TRUNC('day', ts AT TIME ZONE 'America/New_York').
-
TIMESTAMPTZ AT TIME ZONE 'zone'→TIMESTAMP(wall clock). -
TIMESTAMP AT TIME ZONE 'zone'→TIMESTAMPTZ(instant). -
DATE_TRUNC('day', ts)— uses UTC midnight; usually not what reports want. -
DATE_TRUNC('day', ts AT TIME ZONE 'zone')— uses local midnight.
Worked example. Daily clicks for a US dashboard:
| click | UTC ts
|
UTC day | NY day |
|---|---|---|---|
| A | 2026-04-13 03:00 UTC |
2026-04-13 | 2026-04-12 (still 23:00 prev day NY) |
| B | 2026-04-13 14:00 UTC |
2026-04-13 | 2026-04-13 |
| C | 2026-04-14 02:00 UTC |
2026-04-14 | 2026-04-13 (still 22:00 NY) |
Step-by-step.
-
DATE_TRUNC('day', ts)groups by UTC midnight; click A goes into UTC2026-04-13. - But the user in NY clicked at 11 PM on April 12; the dashboard credits the wrong calendar day.
-
ts AT TIME ZONE 'America/New_York'converts the instant to NY wall-clock: A becomes2026-04-12 23:00, C becomes2026-04-13 22:00. -
DATE_TRUNC('day', ts AT TIME ZONE 'America/New_York')then buckets by NY midnight; A goes into April 12, B and C into April 13. - Daily counts now match the user's perception of "yesterday."
Worked-example solution. Daily report in NY local time:
SELECT
DATE_TRUNC('day', ts AT TIME ZONE 'America/New_York')::date AS day_ny,
COUNT(*) AS clicks
FROM clicks
GROUP BY 1
ORDER BY 1;
Rule of thumb: if a report says "daily" or "monthly," ask whose calendar — and then AT TIME ZONE before truncating.
Common beginner mistakes
- Defaulting to
TIMESTAMP"because it's shorter to type" — silently breaks cross-region comparisons after the first deploy abroad. - Storing
TIMESTAMPand then "adding the time zone in the app" — the database loses the original zone the moment you stored. -
DATE_TRUNC('day', ts)on UTC instants for a regional dashboard — daily counts shift by hours. - Using
NOW()interchangeably withCURRENT_DATE—NOW()isTIMESTAMPTZ,CURRENT_DATEisDATEin the session's zone. - Forgetting daylight saving —
INTERVAL '24 hours'is not always "next day at the same wall-clock time."
SQL Interview Question on a Dashboard That Shifted 24 Hours After Deploy
The team deploys their analytics pipeline to a new region; the next morning the "orders today" dashboard shows yesterday's total. Storage is placed_at TIMESTAMP (without time zone). Diagnose the cause and propose a schema + query fix that survives any future deploy.
Solution Using TIMESTAMPTZ + AT TIME ZONE in the Reporting View
Code solution.
ALTER TABLE orders
ALTER COLUMN placed_at TYPE TIMESTAMPTZ
USING placed_at AT TIME ZONE 'America/New_York';
CREATE VIEW v_daily_orders AS
SELECT
DATE_TRUNC('day', placed_at AT TIME ZONE 'America/New_York')::date AS order_day,
COUNT(*) AS orders,
SUM(total) AS revenue
FROM orders
GROUP BY 1;
Step-by-step trace.
| step | observation |
|---|---|
| 1 | original placed_at TIMESTAMP — interpreted in the application's local zone |
| 2 | redeploy moves the app to a server in UTC; same NOW() literal now means UTC, not NY |
| 3 | rows inserted post-deploy look 4 hours older to the dashboard's NY-day buckets |
| 4 |
ALTER COLUMN … TYPE TIMESTAMPTZ USING … AT TIME ZONE 'America/New_York' reinterprets all existing rows |
| 5 | new TIMESTAMPTZ column stores UTC; the view's AT TIME ZONE reverses to NY for display |
| 6 | dashboard buckets daily counts by NY midnight; results are stable across redeploys |
Output: "orders today" matches the operations team's intuition regardless of where the application server lives. Future deploys cannot reintroduce the 24-hour shift because the column type now stores instants, not wall clocks.
Why this works — concept by concept:
-
TIMESTAMPTZstores UTC — the on-disk value is the same regardless of session or server zone. -
USING … AT TIME ZONE 'America/New_York'— one-shot reinterpretation of legacy rows during the type migration. -
AT TIME ZONEin the view, not the table — every report stays explicit about whose calendar it uses. -
DATE_TRUNCon the local wall-clock — daily buckets align to the user's perception of "today." - Stable across redeploys — server moves do not change the displayed daily count.
-
Cost— one rewrite per migration; per-rowAT TIME ZONEis essentially free (microseconds).
Inline CTA: drill the date-functions practice topic and the filtering practice topic for time-aware predicates.
SQL
Topic — date functions
Date-function problems
SQL
Topic — window functions
Window-function problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
6. Semi-structured and other types
JSONB, UUID, and arrays for flexible attributes
PostgreSQL is a "relational with side quests" database — it has first-class JSONB (binary, indexable JSON), UUID (opaque distributed IDs), and array types (INTEGER[], TEXT[], JSONB[]) that make schema-flexible patterns possible without giving up SQL. The discipline is to use them deliberately: JSONB for truly sparse attributes, UUID for public/distributed identifiers, arrays for short bounded lists. Reach for them often and the schema becomes hard to query; reach for them never and you write more tables than you need.
Pro tip: Any column that becomes a frequent filter or join key belongs in a real typed column, not nested inside
JSONB. UseJSONBas the "everything else" bucket for attributes that vary by row.
JSON vs JSONB — when binary indexing matters
The JSONB invariant: JSON stores the input text exactly (whitespace, key order, duplicate keys preserved) and reparses on every read; JSONB stores a binary-decoded representation that is faster to query, supports GIN indexes, and rejects duplicate keys — pay the small write-time cost for read-time speed. For event payloads, application config, and flexible user attributes, JSONB is the default.
-
JSON— text-faithful; preserves whitespace and duplicate keys; slow. -
JSONB— binary; faster reads; canonical (no whitespace, no duplicate keys). -
->— returnsJSON/JSONB. -
->>— returnsTEXT. -
@>containment —'{"a": 1}'::jsonb @> '{"a": 1}'::jsonbisTRUE. -
GINindex —CREATE INDEX … USING GIN (jsonb_col jsonb_path_ops).
Worked example. Searching event payloads for {"plan": "pro"}:
| design | predicate | plan |
|---|---|---|
payload JSON |
payload->>'plan' = 'pro' |
Seq Scan |
payload JSONB |
payload @> '{"plan":"pro"}'::jsonb (with GIN) |
Index Scan |
Step-by-step.
- With plain
JSON, every row must be parsed at query time to extract theplankey. - The planner cannot use a B-tree index because the parse step is per-row.
- Switching the column to
JSONBlets you create aGINindex on the document. - The containment query
@>is index-eligible — PostgreSQL probes the GIN structure for documents that contain the requested subtree. - On a 50 M-row table, the difference is full table scan vs sub-second seek.
Worked-example solution. Indexed JSONB column:
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
payload JSONB NOT NULL,
ts TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON events USING GIN (payload jsonb_path_ops);
SELECT COUNT(*) FROM events WHERE payload @> '{"plan":"pro"}';
Rule of thumb: default to JSONB for any "flexible attributes" column; default to a real typed column for any attribute you filter on more than a few times a week.
UUID — opaque IDs for distributed systems
The UUID invariant: UUID is a 16-byte fixed-width identifier that does not leak ordering or count; ideal for public IDs, multi-region writes, and any context where you don't want consumers inferring growth rate from the sequence; trade-off vs BIGINT is ~2× storage and worse B-tree locality for monotonic insert patterns. Use UUIDs at the boundary (URLs, foreign systems) and BIGINTs internally if performance is critical.
-
UUIDstorage — 16 bytes;gen_random_uuid()frompgcrypto. - v4 (random) — uniform random; great privacy, bad B-tree locality.
- v7 (time-ordered) — sortable by creation time; better cache behavior.
-
UUIDvsTEXT— always declare asUUID;TEXTUUIDs lose validation and index efficiency.
Worked example. Two ways to model a public order ID:
| design | bytes | URL example |
|---|---|---|
order_id BIGINT |
8 |
/orders/12345678 (leaks volume) |
order_id UUID |
16 |
/orders/8c3b7e2a-… (opaque) |
Step-by-step.
-
BIGINTis monotonic — scraping a few order URLs lets a competitor infer your daily volume. -
UUIDv4 is unguessable;8c3b7e2a-…carries no information. - Storage cost: 8 extra bytes per row × millions of rows is meaningful but rarely decisive.
- B-tree locality: random UUIDs spread inserts across the index; v7 (time-ordered) restores append-friendly behavior.
- For most "public ID" use cases, UUID v7 is the clean middle ground.
Worked-example solution. Internal BIGINT + public UUID:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
public_id UUID NOT NULL UNIQUE DEFAULT gen_random_uuid(),
customer_id BIGINT NOT NULL
);
Rule of thumb: expose UUIDs at the API boundary; keep BIGINT joins inside the database.
Arrays — INTEGER[], TEXT[], and the UNNEST pattern
The array invariant: PostgreSQL arrays are first-class typed columns; common operations are ANY (arr) for membership, arr @> arr for containment, and UNNEST(arr) to flatten an array column into rows — useful when the list is short (≤ ~10 items) and *bounded by the row*. For unbounded or queried-often lists, a child table is the better design.
-
INTEGER[]— array of integers; literal'{1,2,3}'::int[]. -
ANY (arr)—x = ANY ('{1,2,3}'::int[])isTRUEifxis in the array. -
@>—'{1,2,3}'::int[] @> '{2}'isTRUE. -
UNNEST(arr)— produces one row per array element; pivot a row of N elements into N rows.
Worked example. A users.role_ids INTEGER[] column:
| user_id | role_ids |
|---|---|
| 1 | {10, 20} |
| 2 | {20, 30} |
| 3 | {10} |
| query | rows |
|---|---|
WHERE 20 = ANY (role_ids) |
1, 2 |
WHERE role_ids @> '{10, 20}' |
1 |
SELECT user_id, UNNEST(role_ids) |
(1,10), (1,20), (2,20), (2,30), (3,10) |
Step-by-step.
- Storing roles as
INTEGER[]keeps the user table compact — no separateuser_rolestable for a small bounded set. -
ANYis the array-sideIN: it tests membership of one value against the column. -
@>tests whether the column array contains every element of the right-hand array. -
UNNESTflattens the column into rows; joiningUNNEST(role_ids)todim_roleproduces a per-role row. - For unbounded role sets (10 K+) the array column gets slow and a child table wins; for typical "a user has 1-5 roles" cases, arrays are clean.
Worked-example solution. A small bounded list:
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
role_ids INTEGER[] NOT NULL DEFAULT '{}'::INTEGER[]
);
SELECT u.user_id, r.role_name
FROM users u
JOIN dim_role r ON r.role_id = ANY (u.role_ids);
Rule of thumb: arrays for short, bounded, rarely-filtered lists; child tables for everything else.
Common beginner mistakes
- Storing everything as
JSONBbecause "schemas are hard" — you trade type safety and indexability for write-time convenience. - Indexing
JSONinstead ofJSONB—JSONcannot use GIN; the index won't help. - Picking UUID v4 PKs on a high-write table and watching B-tree fragmentation degrade write throughput.
- Treating
TEXTUUIDs the same asUUIDcolumns — same data, different operator class, broken indexes. - Storing unbounded lists in arrays — once the array exceeds a few dozen entries, every read TOASTs the column and queries slow.
SQL Interview Question on Searching JSONB Payloads at 50 M-Row Scale
A 50 M-row events.payload JSONB column holds variable payloads. Marketing wants to count events where {"plan": "pro"} appears in the payload, and the query takes 60 seconds. Make it return in under 100 ms without changing the storage shape.
Solution Using a GIN Index with jsonb_path_ops + @> Containment
Code solution.
CREATE INDEX events_payload_gin_idx
ON events
USING GIN (payload jsonb_path_ops);
SELECT COUNT(*)
FROM events
WHERE payload @> '{"plan":"pro"}'::jsonb;
Step-by-step trace.
| step | action | time |
|---|---|---|
| 1 | initial query payload->>'plan' = 'pro'
|
62 s, Seq Scan |
| 2 | switch predicate to payload @> '{"plan":"pro"}'::jsonb (no index yet) |
60 s, still Seq Scan |
| 3 |
CREATE INDEX … USING GIN (payload jsonb_path_ops) (~5 min build) |
— |
| 4 | rerun containment query | 85 ms, GIN Index Scan |
| 5 |
EXPLAIN ANALYZE confirms Bitmap Heap Scan on events
|
one-pass |
Output: 60 s → 85 ms — three orders of magnitude faster — with no schema change, no application change, and no data rewrite. EXPLAIN ANALYZE shows the GIN index handling the containment lookup.
Why this works — concept by concept:
-
@>containment is index-eligible —->>text extraction is not; the operator choice unlocks the index. -
jsonb_path_ops— specialised GIN class for containment-only queries; smaller and faster than the defaultjsonb_ops. -
No row rewrite —
CREATE INDEXbuilds a new index without touching the table heap; existing reads are uninterrupted. -
Generalises to other keys — any future
payload @> '{"key":"val"}'query benefits; no per-key index needed. - Trade-off — write throughput drops slightly (GIN updates are heavier than B-tree); usually invisible.
-
Cost— index build isO(N)one-time; reads becomeO(log N)per query.
Inline CTA: drill the filtering practice page and the SQL practice page for JSON-flavoured patterns.
SQL
Topic — filtering
SQL filtering problems
SQL
Language — SQL
All SQL practice problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
7. Casting and comparison rules
Implicit coercion, explicit CAST, and index-friendly predicates
PostgreSQL silently coerces some type mixes ('42'::text to INTEGER in an = context), refuses others, and lets you make the conversion explicit with CAST(x AS type) or its shorthand x::type. The high-leverage rule is where the cast lands: a cast on a literal is free and index-friendly; a cast on a column usually disables the index. Mixed-type joins are the canonical cause of "the query returns no rows" and "the query is suddenly 100× slower."
Pro tip: When
EXPLAINrevealsSeq Scan on …on a column you indexed, scan theFilter:line for a::typecast. The fix is usually to cast the other side or — better — to change the source column's type so no cast is needed.
Implicit coercion — when PostgreSQL guesses
The coercion invariant: PostgreSQL has a graph of allowed implicit casts (e.g., INTEGER → BIGINT, INTEGER → NUMERIC, TEXT → INTEGER in some contexts) and applies them silently when one side of a binary operator differs from the other; when no implicit cast exists, the query fails with a operator does not exist error. Implicit coercion is convenient until it produces a different answer than expected.
-
INTEGER=BIGINT— implicit widen; no surprise. -
TEXT=INTEGER— works for literals (WHERE id = '42'); fails for columns (WHERE t.id = b.id). -
DATE=TIMESTAMPTZ— implicit widen via session zone; can shift. -
BOOLEAN=INTEGER— not allowed; you must cast.
Worked example. Joining a TEXT user_id to a BIGINT user_id:
| left.user_id (TEXT) | right.user_id (BIGINT) | join works? |
|---|---|---|
'42' |
42 |
error / Seq Scan |
'042' |
42 |
mismatch (lexicographic ≠ numeric) |
' 42' |
42 |
mismatch (whitespace) |
Step-by-step.
- PostgreSQL needs both sides of
=to be the same type; it tries to coerce. - Coercing
TEXT→BIGINTis possible per-value ('42'::BIGINT), but the planner applies it on the column — disabling the index. - Leading zeros, whitespace, and non-digit characters cause the cast to fail mid-query.
- The result is either a hard error or a slow seq scan.
- The fix is upstream: align the source column types so no cross-type compare is needed.
Worked-example solution. Avoid mixed-type joins:
-- if you must cast, cast at write time, not query time
ALTER TABLE staging_users
ALTER COLUMN user_id TYPE BIGINT USING NULLIF(user_id, '')::BIGINT;
Rule of thumb: never store an identifier as text on one side and as integer on the other side of a join. Pick one type at the warehouse contract level.
CAST(x AS type) vs x::type shorthand
The CAST invariant: CAST(x AS type) and x::type produce identical output; the longhand is SQL-standard and self-documenting; the shorthand is PostgreSQL idiomatic and shorter in expression-heavy queries. Both fail with a clear error when the conversion is illegal.
-
CAST(x AS type)— ANSI SQL; works in every dialect. -
x::type— PostgreSQL shorthand. -
Failure modes — same for both:
invalid input syntax for type integeretc. -
NULLIF+CAST—NULLIF(x, '')::INTcollapses empty string to NULL before casting.
Worked example. Two equivalent expressions:
SELECT CAST('42' AS BIGINT), '42'::BIGINT;
SELECT CAST('not a number' AS BIGINT); -- ERROR
SELECT NULLIF('','')::BIGINT; -- NULL
Step-by-step.
- Both
CASTand::produce the same output type and the same value. - Failing input (non-digit string) raises the same error in both forms.
-
NULLIF(x, '')::TYPEis the canonical "treat empty string as NULL" pattern. - In multi-expression SELECTs,
::keeps lines short; in code-review-heavy contexts,CASTis more legible. - Use whichever your team's house style prefers; do not mix unnecessarily.
Worked-example solution. Safe cast for messy ETL data:
SELECT
NULLIF(BTRIM(user_id), '')::BIGINT AS user_id,
raw_payload
FROM staging_events;
Rule of thumb: BTRIM + NULLIF + ::type is the three-step safe-cast pattern for noisy inputs.
Index-killing casts on indexed columns
The index-killer invariant: a WHERE predicate that wraps an indexed column in a function — including an implicit cast — usually forces a sequential scan; the planner cannot prove the function is monotonic on the index, so it falls back to scanning every row. The same query rewritten to cast the literal instead is index-eligible.
-
col::type = $1— bad; column cast disables index. -
col = $1::type— good; literal cast, index used. -
LOWER(col) = $1— bad unless you build a functional index onLOWER(col). -
col = LOWER($1)— good; literal-side function call.
Worked example. A user_id BIGINT column indexed; two predicates:
| predicate | plan |
|---|---|
WHERE user_id = 42 |
Index Scan |
WHERE user_id::text = '42' |
Seq Scan |
WHERE user_id = '42'::bigint |
Index Scan |
Step-by-step.
-
user_id = 42matches the type of the indexed column directly. -
user_id::textapplies a function to every row; the B-tree on the original value cannot be used. - Rewriting as
user_id = '42'::bigintcasts the literal once and reuses the existing index. - If you genuinely need to query by the casted form, create a functional index:
CREATE INDEX ON users ((user_id::text)). - The cheapest fix is almost always to change the data type so no cast is needed.
Worked-example solution. Cast the literal, never the column:
-- good
SELECT * FROM events WHERE user_id = '42'; -- literal coerced
-- bad
SELECT * FROM events WHERE user_id::text = '42'; -- column cast kills the index
Rule of thumb: every :: on the indexed side of a WHERE or JOIN is a code smell. Investigate before merging.
Common beginner mistakes
- Joining a
TEXT user_idto aBIGINT user_idand adding::texton the BIGINT side — works but disables the index. - Treating
'042' = 42asTRUEeverywhere — leading zeros are preserved in TEXT and lost in INTEGER. - Mixing
TIMESTAMPandTIMESTAMPTZin joins — answers depend on session TZ. - Using
LIKEagainst a numeric column without realising it forces a::textcast. - Forgetting to handle empty strings before casting —
''::INTis a hard error; useNULLIF.
SQL Interview Question on a Cross-Type Join Returning Zero Rows
staging_users.user_id TEXT joined to dim_users.user_id BIGINT returns 0 rows even though both tables contain user_id = 42. The planner reports a Seq Scan on dim_users. Identify every contributing cause and propose a fix that produces a sound result and keeps the dim's primary-key index usable.
Solution Using a Single-Type Schema + Explicit Literal-Side Cast
Code solution.
-- short-term: cast the staging text to BIGINT (literal-side cast on TEXT)
SELECT d.*
FROM staging_users s
JOIN dim_users d
ON d.user_id = NULLIF(BTRIM(s.user_id), '')::BIGINT;
-- permanent fix: rewrite staging to BIGINT once
ALTER TABLE staging_users
ALTER COLUMN user_id TYPE BIGINT
USING NULLIF(BTRIM(user_id), '')::BIGINT;
Step-by-step trace.
| step | symptom | cause |
|---|---|---|
| 1 |
WHERE d.user_id = s.user_id errors with operator-does-not-exist |
type mismatch (BIGINT vs TEXT) |
| 2 | analyst rewrites as WHERE d.user_id::text = s.user_id
|
"fixes" the error |
| 3 | query returns 0 rows | leading whitespace in s.user_id (' 42') breaks lexicographic compare |
| 4 |
EXPLAIN shows Seq Scan on dim_users
|
column cast on d.user_id killed the PK index |
| 5 | rewrite with BTRIM + NULLIF + ::BIGINT on the staging side |
index restored, whitespace tolerated |
| 6 | row count matches dim_users.user_id cardinality |
sound result |
Output: join now returns the expected rows, the dim's primary-key index is back in the plan, and the permanent ALTER COLUMN removes the per-query cast for good.
Why this works — concept by concept:
-
Single-type schema — after the
ALTER, both sides areBIGINT; no cross-type compare ever runs. -
Literal-side
BTRIM+NULLIF+::BIGINT— handles real-world dirty input without disabling the dim's index. -
Index on
dim_users.user_idpreserved — because the cast is on the staging side, not the dim side. -
Whitespace-tolerant —
BTRIMeliminates the silent-zero-rows mode. -
Empty-string-safe —
NULLIF(x, '')::BIGINTreturns NULL instead of erroring. -
Cost— one rewrite at the staging layer; per-query cost drops from full table scan toO(log N)PK seek.
Inline CTA: for the join-fluency syllabus see the joins practice page and the SQL practice page.
SQL
Topic — joins
SQL join problems
SQL
Topic — filtering
SQL filtering problems
COURSE
Course — SQL for DE
Zero to FAANG SQL fundamentals
Choosing types (checklist)
| If you are storing… | Prefer… | Watch out for… |
|---|---|---|
| Surrogate keys, row counts |
BIGINT / INTEGER
|
Overflow, unnecessary BIGSERIAL everywhere |
| Money, rates, basis points | NUMERIC(p, s) |
Float rounding in aggregates |
| Labels, names, free text |
TEXT or VARCHAR(n)
|
Collation, padding with CHAR
|
| Instants in distributed systems | TIMESTAMPTZ |
Mixing with TIMESTAMP in joins |
| Nested / sparse attributes | JSONB |
Huge documents without indexes |
| Public opaque IDs | UUID |
Stringly-typed UUIDs in joins |
Pro tip: When you explain a schema in a live screen, say the grain and the type together: "one row per order,
order_idisBIGINT,totalisNUMERIC(14,2)."
Frequently asked questions
Should I use TEXT or VARCHAR(255)?
In PostgreSQL there is no storage penalty for TEXT vs varchar with the same contents. Use VARCHAR(n) when you want the database to enforce a maximum length; otherwise TEXT is simple and common.
Is SERIAL still OK for primary keys?
SERIAL / BIGSERIAL are convenient; GENERATED ... AS IDENTITY is the standards-preferred spelling in modern PostgreSQL. Know both in interviews.
Why is my join returning no rows when the IDs "look the same"?
Check types and whitespace on string keys. Compare plans with EXPLAIN: mismatched types can prevent index use or change semantics of comparison. Then rehearse on SQL-tagged problems →.
When must I use NUMERIC instead of float?
Whenever exact decimal behavior is required—currency, tax, allocations—or when you must match a ledger or regulatory rule. Floats are for measured magnitudes where error bounds are acceptable.
Practice on PipeCode
PipeCode ships 450+ data engineering practice problems—SQL uses the PostgreSQL dialect, with editorials and topics aligned to what strong companies ask. Start from Explore practice →, open SQL practice →, filter by joins → or aggregations →, and see plans → when you want the full library.





Top comments (0)