If you have 30 seconds. "Don't duplicate data" is a rule that fails in production: caches are legitimate, and you ship them anyway. The rule that actually holds is finer-grained — every stored value that resembles a duplication is either Live, Snapshot, or Cache, and each category has a distinct implementation. Skip the classification and you ship the same kind of bug I just found multiplied by 560 rows: a column quietly going stale because nobody decided whether it was supposed to track its source or not. Useful if you have backend tables you stopped trusting.
A column nobody updated
Saturday April 18th, 2026, mid-morning. I'm reading the dashboard with Hélène, who has kept the school's books for nineteen years. Her notebook is open, mine is the screen. She points at one row — a ceramics student, third year, a perfectly ordinary case — and frowns. Rembrandt says she still owes us €1,159. Hélène's payment schedule says €2,262. We cross-check on paper. Eleven monthly installments at €205.65. Hélène is right.
The column my dashboard is reading is contacts.montant_total. It was populated at import in March from a single sum at a single point in time, and nothing in the codebase updates it. Three weeks later, an installment is added — Hélène recorded it manually in the new echeances table — and the column doesn't move. The dashboard reads what it was given in March, which was correct in March.
The audit that follows shows 72 contacts diverging and roughly €18,000 of phantom non-debt. « Et combien d'autres comme ça » — And how many others — Hélène asks, calmer than I am, because this is exactly what she expects from software that decides things on its own.
The bug isn't in the SUM, in the migration script, or in the cron. The bug is that nobody ever decided what contacts.montant_total was supposed to be — a snapshot of a moment, a cache of a sum, or a value that should always reflect the current schedule. The column was inserted because it was needed then. The mechanism that would have kept it true now was never written, because the question wasn't asked.
This is the article about the question.
Why "don't duplicate" doesn't work
The naive rule don't duplicate data also forbids the caches you actually want — materialized views, synchronized counters. The rule that holds: every stored value that resembles a duplication is Live, Snapshot, or Cache, classified before being created. A duplication without a category is a bug in waiting.
The three categories
Live
The value must always reflect the current state of the system. There is no business reason to freeze it at a particular moment.
Implementation: don't store it. Read it directly from a SQL view (v_*) or a query on the source tables.
In Rembrandt: v_reste_du_contact.montant_prevu_total — the dynamic sum of a contact's installments.
The anti-pattern is storing a copy "to avoid the join." Divergence is guaranteed without a refresher, and the refresher never gets written.
Snapshot
The value must remain frozen at the moment of a business event. Modifying it retroactively is a functional fault.
Implementation: store it, never recalculate. Protection in writing after creation — a CHECK constraint, a trigger that forbids UPDATE, or simple documented discipline.
In Rembrandt: inscriptions.tarif_applique — the price at the date of enrollment, which doesn't move if the course is repriced later.
The anti-pattern is "recalculating retroactively for consistency." That steals history. If a price revision must apply, it applies through a new event — credit note plus new invoice — not by editing the existing snapshot.
Cache
The value is derivable from other data but expensive to compute on every read. You accept storing it for performance, on the condition that you declare an explicit refresher in the same commit as the column.
Implementation: store it plus declare the mechanism. Three mechanisms are admitted: GENERATED ALWAYS AS (...) for intra-row derivation, a SQL trigger named trg_* on the feeding tables, or a materialized view mv_* with a scheduled or post-bulk REFRESH.
No cache survives without its named refresher. If you can't guarantee refreshment, you fall back to Live.
In Rembrandt: cours.places_prises — cache refreshed by trigger on inscriptions.
The three-question decision
In front of every new field that resembles a duplication, three questions in order:
- Must the value evolve with the upstream data? If no — it's a frozen past event → Snapshot. Store, document "snapshot at creation."
-
Is computing it on the fly acceptable performance-wise? If yes → Live. Don't store. Create a
v_*view. -
Otherwise it's a Cache:
trg_*trigger,GENERATED ALWAYS AS, or materialized viewmv_*— declared in the same commit as the column. If none of the three is tenable, fall back to Live and accept the compute cost.
What I found in my own schema
I ran the audit a week after the Hélène incident. Fourteen divergences, six categories. The archetype is contacts.montant_total — a Live in disguise treated as a Snapshot, by negligence. The fix is not "recalculate periodically" — that re-establishes divergence at the next installment created. The fix is to delete the column and route reads through v_reste_du_contact. Migration of category, not patch. The issue isn't the storage. It's that the storage was decided without declaring the contract that would have kept it true.
A small naming discipline
Every Cache column carries a SQL comment at migration time:
COMMENT ON COLUMN cours.places_prises IS 'CACHE: refreshed by trg_inscriptions_sync_places';
Without this comment, the next reader of the schema cannot distinguish a managed Cache from a Live that diverged silently. When a migration review surfaces a column without a category, you stop, you classify, or you remove.
What you can copy into your project
Two concrete elements you can apply tomorrow:
- Run a categorization audit on the columns you stopped trusting. Pick the five most-suspect-feeling columns, classify each as Live, Snapshot, or Cache. Any column you cannot place is a bug already shipped.
- Refuse the patch that re-establishes the bug. When a Live-disguised-as-Snapshot diverges, the fix isn't "recalculate ponctually." That postpones the next incident by exactly one upstream event. The fix is migration of category — drop the column, route reads through the view.
The rule reads in one line: every stored value that resembles a duplication is Live, Snapshot, or Cache, with a contract declared in the same commit.
Coda
Hélène closed her notebook at the end of that morning. She didn't say « je vous l'avais dit » — I told you so — that's not her style; she said « vous voyez bien que ça ne suffit pas » — you see clearly that it isn't enough. That's the kind of victory you can't measure, only assume.
Companion code: rembrandt-samples/live-snapshot-cache/ — the 3-question decision checklist plus four SQL patterns (Live view, Snapshot protection, Cache trigger, category migration), MIT, copy-pastable.

Top comments (0)