The 3 AM incident
A few months ago one of my AI agents got stuck in a retry loop overnight and quietly burned through a month of credits. The provider dashboard told me about it the next morning. The support ticket got a polite "usage is final."
Provider dashboards are bills. I needed a brake.
What's actually missing in the stack
After looking at what exists, the gap was clear:
- AI gateways (LiteLLM, Portkey) — great at routing, not designed to stop you.
- Observability (Helicone, Langfuse) — great at explaining, after the fact.
- Provider dashboards — billing history, not real-time control.
Nothing was sitting between "the agent is making a call" and "the agent has already burned $500."
Cost Firewall
Cost Firewall is a local plugin for the OpenClaw gateway. It watches call metadata in real time and trips on four signals:
| Failure mode | Default threshold | Action |
|---|---|---|
| Retry loop | 3 consecutive failures from same source | Trip + cooldown |
| Token storm | 100K tokens / 60s | Global block |
| Call flood | 30 calls / 60s | Global block |
| Daily budget cap | Your configured ceiling | Block until next day |
| Manual panic | openclaw firewall stop |
Pause every AI call |
Sources are tracked independently — one noisy agent doesn't take everyone else down.
Two-mode workflow
This is the part I think matters more than the rules themselves:
openclaw firewall mode observe # record only, do not block
openclaw firewall log --last 20 # see what would have been blocked
openclaw firewall mode protect # flip the switch
Run observe for a day. The log alone is usually eye-opening — you'll find retry loops you didn't know existed and prompts using more tokens than you assumed. Then tune thresholds to your traffic, not someone else's blog post, and flip to protect.
Privacy posture
| Question | Answer |
|---|---|
| Does it need an account? | No |
| Does it phone home? | No |
| Does it store prompt text by default? | No |
| Where do events live? | Local JSONL on your gateway |
| Can I audit it? | Yes, MIT TypeScript |
The default is storePromptText: false. Runtime cost control belongs on the machine running the agent.
One-line install
curl -fsSL https://raw.githubusercontent.com/mapick-ai/cost-firewall/v0.2.12/install.sh | bash
openclaw firewall mode observe
openclaw firewall log --last 20
Then a local dashboard at http://localhost:18789/mapick/dashboard.
Where it doesn't fit
I want to be explicit: it's not a gateway, not an observability platform, not a billing system. The shortest positioning I can give you:
Gateways route. Dashboards explain. Cost Firewall brakes.
Use it alongside LiteLLM / Helicone / your provider dashboard. It's the brake pedal that was missing between them.
Repo: https://github.com/mapick-ai/cost-firewall
⭐ if this saves you a billing screenshot. Issues and PRs welcome.
Top comments (0)