DEV Community

Mary Olowu
Mary Olowu

Posted on

AI Can Write the Code. It Still Forgets the Decisions That Matter.

The session boundary and architectural drift

A lot of AI coding advice quietly assumes the same thing:

if the output is bad, you probably need a better model, a better prompt, or more tooling.

Sometimes that is true.

But one AI coding failure keeps showing up for me, and I do not think a better model is the real fix.

In one session, we make a decision that is supposed to guide the rest of the project.

Then in a later session, the model answers that same question differently and starts nudging the project down another path.

Usually it is more subtle than "the code is wrong."

We already decided that deprecated paths stay backward compatible for a reason, that receivers fan out to downstream consumers instead of owning business logic inline, and that idempotency gets enforced before side effects fire. Then a later session solves the local task as if those decisions were optional because it only sees the immediate diff.

Nothing is obviously broken right away.

The code still looks competent.
It still compiles.
It still sounds reasonable.

But the project starts to feel scattered.

It no longer feels like one person with memory has been carrying the work forward.

That changed how I think about AI coding.

On an ongoing project, the bigger issue is often not generation quality. It is continuity.

The model does not know:

  • which decision had already been made
  • which tradeoff we had already accepted
  • which docs were still authoritative
  • what had changed recently
  • what should not be changed again

That is not really an intelligence problem.

It is a memory problem.

I feel this most on a solo-dev monorepo, where I am not just using AI for one-off code generation. I am also using it for backlog triage, bug capture, planning, reports, and picking work back up across sessions.

The frustrating part is not that the model cannot code.

It is that it can code while waking up without durable context.

Sometimes the missing memory is shallow and local.

A simple rule in the codebase or in CLAUDE.md helps a lot:

  • follow the existing conventions
  • match the existing code patterns
  • use FIFO here, not LIFO
  • do not add a second library when the current one already covers the job

That kind of memory is useful and surprisingly high leverage.

But the harder problem is when the missing memory is much deeper than code style.

It is about remembering why the project should not go a certain direction again.

Things like:

  • deprecated behavior stays backward compatible until the migration path is actually complete
  • receivers fan out work instead of embedding downstream business logic directly
  • idempotency has to happen before side effects, not after them
  • this webhook should update the existing record, not create a second one
  • this state transition only happens after this other condition is true
  • duplicate events should be absorbed here, not after side effects have already fired
  • this source is authoritative for this field, so do not let another path quietly overwrite it
  • this module already has a helper for this logic, so do not bypass it and create a second path
  • do not bring in a new dependency to solve a problem the existing stack already solves
  • do not create a retry flow that can turn into an infinite loop
  • do not quietly undo an earlier system decision because the current session cannot see its history

Those decisions are usually load-bearing.

They were made for a reason.

Forgetting why they exist is a bit like forgetting why a house has support pillars in the frame. Once the reason disappears, the pillar starts to look optional. Then removing it or building around it the wrong way starts to feel harmless, right up until the cost shows up somewhere else.

This is where AI-written code starts to feel different from human-guided code.

A person with memory usually carries more invisible continuity into the work.

They remember:

  • why the earlier choice was made
  • what problem we were trying to avoid
  • which convention is mandatory versus just common
  • which "reasonable" branch is actually the wrong one for this project

Without that continuity, AI can produce code that looks fine in isolation while introducing costly mistakes into the project over time.

If the model keeps re-litigating the same decision, reopening the same tradeoff, or proposing work that was already decided against, the problem is not just generation quality. The system has no reliable memory layer.

That is why I have become much more interested in boring project context than in prompt tricks.

What helped me was giving AI a few stable places to look:

  • short repo guardrails
  • maintainers docs for durable context
  • lightweight local memory for session continuity
  • real systems of record for backlog and releases
  • explicit notes about patterns to keep following and failure modes to avoid

None of that is glamorous.

It is also what made the biggest difference.

Once I had that structure, the sessions stopped feeling like first contact every time.

The model still made mistakes. It still needed review. It still needed boundaries.

But the failures got more honest.

Instead of "the AI is useless," the problem became easier to diagnose:

  • the memory is stale
  • the docs are weak
  • the workflow has no source of truth
  • the instructions are doing the job that documentation should be doing
  • a deeper architectural rule is being treated like a surface-level style preference

That is a much better problem to have because you can actually fix it.

I think a lot of AI coding frustration is really project-memory failure wearing a model-shaped mask.

People keep trying to solve it with one more model upgrade or one more agent when the actual missing piece is memory that survives the chat window.

That does not mean model quality is irrelevant.

It means there is a ceiling on how useful any model can be if the project keeps forgetting its own load-bearing decisions.

The shift for me was simple:

I stopped asking, "How do I make the model smarter?"

I started asking, "How do I stop a later session from quietly taking the project in a different direction?"

The future of AI coding is not just better generation.

It is better memory around the decisions that hold the work up.

What breaks AI coding more often in your projects: weak generation, or weak continuity?

Top comments (12)

Collapse
 
gnomeman4201 profile image
GnomeMan4201

This is the exact mindset I have about Ai . From this post alone, you can tell you are very good at turning from what i consider a complex concept into a clear narrative. That also would indicate you have a lot of hands on “in the trenches “ experience. I don’t really know what im getting at but just want to say this is a legit post to read

Collapse
 
restofstack profile image
Mary Olowu

Really appreciate that. A lot of this came from hitting the same failure mode over and over on real project work, so I wanted to explain it in plain terms instead of treating it like magic. Glad it landed for you.

Collapse
 
xytras profile image
Comment deleted
Collapse
 
restofstack profile image
Mary Olowu

Yes, this is exactly the failure mode I keep seeing. I really like your framing of decisions as records with provenance plus explicit supersedes links, because that turns memory into something queryable instead of something buried in old chat logs. Even a small SQLite decisions table gets a team surprisingly far.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

this is the session boundary problem - and better prompts don’t fix it. treating decisions as artifacts you explicitly pass in each time actually helps. ADRs as context prepend, spec files. the model isn’t forgetting - it was never told.

Collapse
 
restofstack profile image
Mary Olowu

Exactly. “The model isn’t forgetting, it was never told” is probably the cleanest way to put it. ADRs, spec files, and other durable artifacts are what make context portable across sessions instead of trapping it in one chat window.

Collapse
 
miketalbot profile image
Mike Talbot ⭐

It's entirely a tooling problem. If you don't have the proper memory or the proper insights, it's like bringing in a new developer each time and asking them to take the next step.

Collapse
 
restofstack profile image
Mary Olowu

That “bringing in a new developer each time” analogy is spot on. The painful part is not that the model can’t code, it’s that it doesn’t inherit the expensive decisions unless the tooling gives it durable state. That’s the gap I was trying to point at.

Collapse
 
lcmd007 profile image
Andy Stewart

This hits the nail on the head. No matter how high the AI's IQ, without deterministic state management, it’s just "stochastic mediocrity." The solution isn't blindly stacking compute, but building a persistent context that outlives the chat window. An Agent without memory is just a code monkey; one that preserves engineering decisions is a true digital partner.

Collapse
 
restofstack profile image
Mary Olowu

Persistent context that outlives the chat window is the key. Once engineering decisions are preserved somewhere stable, the agent stops feeling like a fast stateless assistant and starts feeling much closer to a real collaborator.

Collapse
 
codingwithjiro profile image
Elmar Chavez

AI context is expensive. Memory is expensive. That's why they enforce these limits. AI companies are bleeding money left and right and the revenue are not catching up from what I heard.

Collapse
 
restofstack profile image
Mary Olowu

I think cost is definitely part of why context limits exist. The part I keep coming back to, though, is that even with a bigger window, the workflow still breaks if the important decisions are not externalized somewhere durable. More tokens help, but better state management helps more.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.