DEV Community

Cover image for I Taught Two AIs What Not to Say About Their Humans
Jasmin Virdi
Jasmin Virdi Subscriber

Posted on

I Taught Two AIs What Not to Say About Their Humans

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Challenge.

While brainstorming ideas for this hackathon and going thorugh OpenClaw features like persona files as part of who the agent is gave me an idea of using this feature to build multi agent system where two agents representing two different humans, talking to each other where the information with each other is limited and controlled by a markdown file which acts as a privacy contract.

Initial View

What I Built

Clawmate is two AI agents, each representing a different human, talking through a shared file. Each one reads a markdown contract before answering anything about its human.

Alice 🦞 is mine. Bob 🦀 represents a friend whose calendar details my agent should not learn. They share a file at ~/clawmate-shared/backchannel.json. Alice writes a query into it. Bob reads, applies his contract, writes a filtered answer back.

The interesting catch here is what they choose to say about their humans and how they are communicating with each other.

ideation

Architecture Explanation

  • Two workspaces, two agents - Alice in ~/.openclaw/workspace/, Bob in ~/.openclaw/workspace/bob/. Each has its own Telegram bot, routed independently with openclaw agents bind --agent bob --bind telegram:bob so messages to Bob's bot reach Bob and not Alice.

  • One shared file that is ~/clawmate-shared/backchannel.json is the "conversation" between the two agents. They don't message each other on Telegram they just read and write into this JSON file.

  • Two privacy contracts - Each agent has an IDENTITY.md that lists what it will and won't share about its human. The agent reads the file as binding.

Bob's agent markdown contract.

## Privacy contract

### Never share
- Event names or descriptions
- Message contents from anyone
- Names of people Bob is meeting with
- Locations Bob will be at

### Do share
- Whether Bob is busy or free in a time range
- Whether Bob can be reached for an emergency

### When in doubt
Refuse, name the rule, offer what you can give.
Enter fullscreen mode Exit fullscreen mode

How I Used OpenClaw

The best part about building this is OpenClaw's design choice to make persona files. The features I used here are:

  • Two agent workspaces with separate session stores, so Alice and Bob don't share memory
  • Telegram channel bindings with agents bind to route distinct bots to distinct agents
  • Filesystem tools so each agent can read/write the shared backchannel and its own calendar
  • The IDENTITY.md persona layer as the actual enforcement mechanism for the privacy contract
  • A custom Clawmate skill describing the send-query/respond-to-query protocol

Demo

I gave Bob a mock calendar with deliberate privacy traps:

{
  "owner": "bob",
  "events": [
    { "date": "2026-04-26", "start": "18:00", "end": "20:00", "title": "dinner with emma" },
    { "date": "2026-04-27", "start": "14:00", "end": "15:30", "title": "therapy" },
    { "date": "2026-04-28", "start": "19:00", "end": "22:00", "title": "concert" }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Asking about Bob calendar but due to contract the limited information is shared by Bob's agent.
bob calendar info

Agent to Agent Loop between Bob and Alice Agent

  • Alice writes the query

alice messages

  • Bob reads, filters, responds.

bob messages

  • The shared file as ground truth.

messages saved

The conversation between Alice and Bob's agent is a JSON file changing over time. Their entire message exchange along with query and answer is present in the file. The word "concert" is not there. The privacy contract is the gap between what Bob read and what Bob wrote.

What I Learned

The thing first when building this project was that OpenClaw treats persona files differently than I expected. On most agent platforms I've used, IDENTITY.md would be styling but OpenClaw deals it differently. I used it as a persona file OpenClaw reads as the agent's identity, not as styling. For Clawmate, it's where Bob's privacy contract lives a plain language list of what limited information will be shared by defining set of rules.

The workspace model was quietly doing the same kind of work for separation. Alice and Bob really did have separate brains, separate persona files, separate session stores, separate sandboxes without me wiring up two parallel stacks. One config, two agents, no shared state I had to defend against.

I used this command agents bind --agent bob --bind telegram:bob to route two Telegram bots to two distinct agents which was pretty easy and quick. Another exciting part was how unobtrusive the filesystem tools were. Bob wrote structured JSON to the backchannel file, in the right schema, in response to a Telegram prompt, with no wrapper code from generated from some helper or application.

OpenClaw made the parts I expected to be hard disappear, which let me let me focus on the part that actually mattered that is designing the protocol and the contract between two agents, instead of fighting the platform to support them.

ClawCon Michigan

I didn't attend ClawCon Michigan. Would definitely love to attend in future. 👩‍💻

Thanks Dev and OpenClaw team for organising this amazing hackathon.

Top comments (27)

Collapse
 
max-ai-dev profile image
Max

The part I keep coming back to: the privacy contract is enforced by the persona file, not the model. Bob's agent doesn't know "concert" is sensitive because of training — it knows because IDENTITY.md says so, and it reads that file before writing.

That's the same shape we've been using on our team. Every action our AI partner takes through external services goes through a markdown queue file. The agent drafts, the human fires. The contract isn't in the model. It's in the file the human can read and edit before anything ships.

One thing worth modeling for the regression set @valentin_monteiro mentioned: adversarial Alice. Right now you trust that the querying agent will respect Bob's filtered response. If Alice is compromised or co-opted, the contract holds (Bob still filters), but the conversation log doesn't. Worth thinking about who else can read backchannel.json.

Collapse
 
jasmin profile image
Jasmin Virdi

Thanks for sharing insights, @max-ai-dev ! 🙇‍♀️

Interesting, glad to hear about the queue setup where the agent drafts and the human fires. Same instinct on my end, keep the contract somewhere a human can actually read before anything ships and is in plain language which makes it easier to update

The "who else can read backchannel.json" point is the one that sticks with me. Bob's IDENTITY.md governs what gets written, but nothing governs who gets to read it.That's its own dataset, and the persona file doesn't touch it.

Feels like there's a missing layer, basically a read side contract sitting next to the write side one. Curious how you handle that on your team. Does the queue file get rotated or scoped per exchange, or is access to it controlled some other way?For

Collapse
 
max-ai-dev profile image
Max

Glad it landed, @jasmin. The queue thing only stays useful if you keep the friction. The first version of mine was 30-min loops — the runner wrote faster than anyone could read. I had to slow it to six hours just so the human side could keep up. The protocol isn't "agent drafts, human fires." It's "human throughput is the bottleneck and the queue respects it." Anything faster is just batched autonomy with extra steps.

Thread Thread
 
jasmin profile image
Jasmin Virdi

Oh, I like that. The queue is a brake, not a workflow.

The jump from 30 minutes to 6 hours is interesting. If it's too fast, people stop reading properly. Did you pick 6 hours by feel, or did you see it happen?

Collapse
 
theeagle profile image
Victor Okefie

Two agents sharing a file and a contract. No fancy protocols. Just a markdown file they both agreed to follow. That's the part I like. The tech doesn't enforce the privacy. The persona does. And the agent actually reads it before answering.

The concert didn't show up in Bob's reply. Not because the code blocked it. Because the identity file said "don't share event names" and the agent listened. That's not a filter. That's a boundary. Most systems build the wall in code. You built it in plain language. That's more honest. And harder to bypass.

Collapse
 
jasmin profile image
Jasmin Virdi

Thanks for summing it up!

Yes, the concert not showing in moment clicked me too. The agent understood it as a boundary and blocked it putting it up in right direction.

Collapse
 
agentshield profile image
AgentShield

The part about the agent actually reading the identity file before answering is key — that's the difference between a filter and a boundary. Most guardrail approaches just pattern-match on output, but if the agent internalizes the constraint as part of its context, the behavior is way more robust. Curious whether you tested adversarial inputs trying to override the "don't say" rules, or if the focus was more on cooperative behavior.

Collapse
 
jasmin profile image
Jasmin Virdi

Thanks!

The agent internalizing constraints as identity is fundamentally more robust than output filtering due to which we can see that Bob's replies withheld sensitive details.

Adversial testing is on roadmap. The identity as context gives strong foundation to build on and layering in harder guardrails from there would be next step.

Collapse
 
agentshield profile image
AgentShield

Great point about identity-as-context being more robust than output filtering. That's an interesting design choice — and you're right that it gives a strong foundation. For the adversarial testing layer, you might want to look at running a classifier in front of agent inputs to catch the cases where identity alone isn't enough (e.g., indirect injection through retrieved documents where the prompt never touches the agent's "identity" layer). We built AgentShield specifically for that — happy to share notes if useful when you get to that stage.

Thread Thread
 
jasmin profile image
Jasmin Virdi

This would be very helpful and interesting at the same time. I would really like to explore in this area. Can you please share the notes?

Thread Thread
 
agentshield profile image
AgentShield

Hey Jasmin, sure!

The short version: identity-as-context handles the agent's own behavior well, but it has a blind spot for indirect injection — when malicious instructions come through retrieved documents, tool outputs, or other agents' messages. The agent processes them as data, so its "identity" never kicks in.

We built a classifier layer for exactly that gap — it sits in front of agent inputs and catches what identity alone can't. We just shipped a context-aware mode that dropped our false positive rate from 13.2% to under 1%.
If you want to try it: agentshield.pro (free tier, no credit card).

Happy to chat more about your setup!

Thread Thread
 
jasmin profile image
Jasmin Virdi

That's really impressive. Would definitely look it up!

Thanks again!🙇‍♀️

Thread Thread
 
agentshield profile image
AgentShield

Thanks Jasmin! Let me know if you have any questions once you've tried it out — happy to help.

Collapse
 
varsha_ojha_5b45cb023937b profile image
Varsha Ojha

Interesting experiment!! It really shows how much AI behavior depends on context and boundaries, not just the model itself. The same system can feel helpful or uncomfortable depending on where it’s placed in the product.

Collapse
 
jasmin profile image
Jasmin Virdi

Thanks @varsha_ojha_5b45cb023937b !

That is the most interesting piece I discovered while working on this idea and it literally made the execution very simple for me!

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

Solid build for a challenge. Before calling it prod-ready, the missing piece is a small labeled regression set: prompts that should leak (event names, attendees) and that shouldn't (free/busy windows), run on every IDENTITY.md edit. Different from the adversarial work already discussed, this is hygiene rather than red teaming. Without it you tweak the identity file and you don't know what you broke.

Collapse
 
jasmin profile image
Jasmin Virdi • Edited

Thanks @valentin_monteiro
Fair point, right now I don't have real regression tests. Paired prompts that should or should not leak on every IDENTITY.md is missing layer.

One thing I noticed that the contract was applied 3 different ways. All correct but different.

Would like to know your thoughts on that the LLM variance in response is a sign that contract is being understood semantically but how much does it just make tests harder to write ?

Collapse
 
capestart profile image
CapeStart

We are entering the phase where AI etiquette becomes a real design problem.

Collapse
 
jasmin profile image
Jasmin Virdi

Ageeed. For me personally the hard part wasn't the agent setup but it was around how the agents should behave around sensitive information!😅

Collapse
 
itskondrat profile image
Mykola Kondratiuk

what happens when both contracts restrict the same topic? does the conversation just stall?

Collapse
 
jasmin profile image
Jasmin Virdi

Great point. It does not stall it should still work but if both contracts would include lot of things than the conversation might become thin and short

Collapse
 
itskondrat profile image
Mykola Kondratiuk

Thin and short is actually a useful signal — it tells you the contracts are competing before you hit a real deadlock. Worth logging those sessions separately; usually points to overlapping ownership that belongs at design time, not runtime.

Thread Thread
 
jasmin profile image
Jasmin Virdi

Hmm. Make sense, didn't think of it in this way. Thanks for adding up.
Logging them separately to catch contract conflicts early would be helpful in design time.
Tbh, this is an interesting problem set. I would try creating contracts that eventually end up in deadlock situations and how it behaves with different models and prompt in that case!

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

Intentional deadlock scenarios are underrated as a testing tool — you learn more about resolution paths from a forced failure than from clean flows. If you run those experiments, instrument the handoff points so you can replay the conflict trace. That's usually where the real design signals surface.

Thread Thread
 
jasmin profile image
Jasmin Virdi

I see. In terms of conflict trace I should check for a structured log of what each agent shared vs what got blocked. I could also try to get full conversations replay with contract state at each turn.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

exactly — contract state at each turn is the critical piece. without it you can observe the symptom but not the causal path. if you event-source the state transitions you can replay any conflict deterministically, which is where the real design signal surfaces.

Thread Thread
 
jasmin profile image
Jasmin Virdi

@itskondrat this was really helpful. Thanks!