My AI Agent Keeps Forgetting Everything; So do I...
I have multiple sclerosis. Some days are better than others, but one thing is constant: repe...
For further actions, you may consider blocking this person and/or reporting abuse
This is a step but there will always be a challenge to make AI work consistently in the long run. Almost everything that happens with software engineering involves subjective decisions and these hallucinations and inconsistencies prove this.
Shout out to @diet-code103!!
A compressed knowledge graph, particularly on MS as .md level...I would be happy to do that for you to improve your memory issue.
I think you need to explain that a bit more clearly for the rest of us to understand - are you proposing a different (or "better", even) approach than what the author proposed?
Well I don’t need to do a thing. However, I’d that a kind request of further explanation?
No you don't have to do anything, but you could ;-)
My point basically is that the author already seems to have a pretty good grasp of the issue, and how to tackle it :-)
Fair point — let me explain.
The author's five-file structure is excellent execution tracking. What I was gesturing at is a different layer: instead of storing project context as flat markdown files, you compress it into a knowledge graph — nodes and edges representing concepts, decisions, and relationships, serialized as .md.
The practical difference: flat files grow linearly. A knowledge graph stays compact because relationships replace repetition. The agent doesn't re-read "we use Postgres" buried in a decisions log — it traverses a typed edge from DatabaseChoice → Postgres with the rationale attached. Context retrieval becomes a graph query, not a document scan.
So not a better approach — a different abstraction built on a similar idea. Stephen's five-file structure could sit underneath a KG layer: the files feed the graph, the graph feeds the agent.
The MS angle was specific: for someone managing cognitive fatigue, a compressed, queryable knowledge graph reduces the mental overhead of re-orienting the agent each session. Less to re-explain, because the structure carries more of the context automatically.
Thank you, that makes a lot of sense:
"A knowledge graph stays compact because relationships replace repetition"
This resonates — we hit the exact same primitive from a different angle.
Your AA-MA solves "how does a single agent keep its own memory across
sessions." We hit the same wall (Markdown + structure + separation by
behavior type) trying to solve a different problem: how do N agents
coordinate without a broker.
The core insight we converged on independently:
sender-to-recipient) — directory encodes statusBoth exploit the same fact: the filesystem is already a state machine.
renameis atomic (POSIX).lsis a full diagnostic. You getvisibility + atomicity + zero infra, if you stop trying to mediate
everything through a chat context.
And your "None of this was designed upfront — each piece was bolted on
after a failure made it obvious" is the exact pattern we observed. After
48 hours of 4 Cursor agents running on a minimal rulebook, they had
invented 6 coordination patterns we hadn't written (broadcast addressing,
anonymous role slots, traceability frontmatter, subtask sub-folders…).
All of them surfaced as new filenames in a shared folder. None of this
is designable. It emerges.
Field report + MIT protocol: github.com/joinwell52-AI/FCoP
Genuinely curious what happens if AA-MA's per-task 5-file memory sits
underneath FCoP's routing layer. Feels like they compose, not conflict.
Re @leob's "time for a standard?" — I suspect this won't come from
Anthropic, because the whole point is tool-neutral. If it works across
Claude Code, Cursor, and Codex, it has to come from users. Which is
what we're both doing :)
Impressive, both Diet-Coder's effort and yours ...
With all of these separate efforts going on I start wondering if it's time for Anthropic to pull together some sort of "standard" and baking it into CC ? Because right now everyone seems to be scrambling to reinvent this wheel, with different approaches and different ambition levels ...
The distinction you're drawing here — separating knowledge by behavioral type (what changes vs. what doesn't) — is the insight that most "just use CLAUDE.md" advice misses. Treating a single instruction file as both strategy and execution state creates the hallucination problem you described: the agent can't tell the difference between a settled architectural decision and current task state.
The five-file structure maps well to how working memory actually functions: long-term facts, deliberate decisions, current focus, planning, and audit trail. What strikes me is that this is really typed memory — you're enforcing contracts between information types so the agent can't confuse "we always use postgres" with "this PR is still in review."
One thing I've found useful on a similar structure: a versioned decisions log where you append rather than overwrite. If an agent re-litigates a settled decision, you can trace exactly when and why it was resolved — helpful during post-mortems when you're not sure whether the agent worked from stale context or genuinely hit an edge case.
The part about this emerging from real regulated-industry failures rather than theoretical design resonates — these patterns always look obvious in retrospect.
This hits way too close.
My biggest frustration isn’t even “new session = no memory” — I’m used to that.
It’s when the agent forgets things inside the same session / project flow.
I’ll explain architecture, constraints, decisions — everything looks aligned.
Then 20–30 messages later it starts drifting, ignores earlier decisions, or straight up contradicts them.
That’s where it becomes painful, because it’s not just context loss — it’s trust loss.
And I’ve tried the usual fixes:
• long system prompts
• “single source of truth” docs
• summaries
But like you said — they mix static knowledge with dynamic state, and the agent just can’t prioritize what matters.
The idea of separating memory by type instead of just “more context” makes a lot of sense.
Curious — have you noticed this helping with in-session drift, or mostly across sessions?