DEV Community

sun evan
sun evan

Posted on

I built a kill switch for runaway AI agents — Cost Firewall is MIT

The 3 AM incident

A few months ago one of my AI agents got stuck in a retry loop overnight and quietly burned through a month of credits. The provider dashboard told me about it the next morning. The support ticket got a polite "usage is final."

Provider dashboards are bills. I needed a brake.

What's actually missing in the stack

After looking at what exists, the gap was clear:

  • AI gateways (LiteLLM, Portkey) — great at routing, not designed to stop you.
  • Observability (Helicone, Langfuse) — great at explaining, after the fact.
  • Provider dashboards — billing history, not real-time control.

Nothing was sitting between "the agent is making a call" and "the agent has already burned $500."

Cost Firewall

Cost Firewall is a local plugin for the OpenClaw gateway. It watches call metadata in real time and trips on four signals:

Failure mode Default threshold Action
Retry loop 3 consecutive failures from same source Trip + cooldown
Token storm 100K tokens / 60s Global block
Call flood 30 calls / 60s Global block
Daily budget cap Your configured ceiling Block until next day
Manual panic openclaw firewall stop Pause every AI call

Sources are tracked independently — one noisy agent doesn't take everyone else down.

Two-mode workflow

This is the part I think matters more than the rules themselves:

openclaw firewall mode observe     # record only, do not block
openclaw firewall log --last 20    # see what would have been blocked
openclaw firewall mode protect     # flip the switch
Enter fullscreen mode Exit fullscreen mode

Run observe for a day. The log alone is usually eye-opening — you'll find retry loops you didn't know existed and prompts using more tokens than you assumed. Then tune thresholds to your traffic, not someone else's blog post, and flip to protect.

Privacy posture

Question Answer
Does it need an account? No
Does it phone home? No
Does it store prompt text by default? No
Where do events live? Local JSONL on your gateway
Can I audit it? Yes, MIT TypeScript

The default is storePromptText: false. Runtime cost control belongs on the machine running the agent.

One-line install

curl -fsSL https://raw.githubusercontent.com/mapick-ai/cost-firewall/v0.2.12/install.sh | bash
openclaw firewall mode observe
openclaw firewall log --last 20
Enter fullscreen mode Exit fullscreen mode

Then a local dashboard at http://localhost:18789/mapick/dashboard.

Where it doesn't fit

I want to be explicit: it's not a gateway, not an observability platform, not a billing system. The shortest positioning I can give you:

Gateways route. Dashboards explain. Cost Firewall brakes.

Use it alongside LiteLLM / Helicone / your provider dashboard. It's the brake pedal that was missing between them.

Repo: https://github.com/mapick-ai/cost-firewall

⭐ if this saves you a billing screenshot. Issues and PRs welcome.

Top comments (0)