Aerostack

Posted on May 5

I shipped "OpenClaw mobile" — and the bug I spent two days blaming the wrong layer for

#ai #opensource #flutter #showdev

I caught myself walking back to my laptop at 11pm for the third time that night, just to check what the OpenClaw agent was doing on it. So I built the iOS app I kept wishing existed. iOS went live last week.

This is the technical writeup — the architecture, the WebSocket pipeline, and the bug that made me question my entire life for two days.

What I built

Aerostack — a phone-native control plane for OpenClaw / Claude Code-style agents running on your own machine (laptop, desktop, home server, VPS). Live thinking stream, swipe-to-approve with glob policies, edit-args-before-approve, agent chat from your phone, MCP / skills / plugins / channels manageable from the device.

Stack:

Mobile: Flutter 3.27 (Dart 3.11) + Riverpod 3 + GoRouter
WebSocket: web_socket_channel (Dart side) ↔ Cloudflare Durable Object (relay side)
Daemon: TypeScript / Node (aerostack/gateway on npm, MIT) — runs on your machine
Relay: Cloudflare Workers + Durable Objects — stateless, no DB writes for prompts/transcripts
Storage on phone: flutter_secure_storage (JWT) + hive_flutter (offline approval queue)

The thing that defines the entire architecture is the local-first constraint. Let's start there.

The architectural bet — relay, not a data sink

The dominant pattern for "AI agent control plane" startups is: pipe transcripts through your cloud DB, render a dashboard. Aerostack does the inverse.

The contract:

[Your machine]                [Aerostack relay]              [Your phone]

@aerostack/gateway   ←─ WebSocket ─→   Durable Object   ←─ WS ─→   iOS app
  (open source,                       (stateless,
   on your laptop)                     pass-through)

The Durable Object holds connection state — which machine is paired with which phone, last-seen timestamps, the active session ID. It does not write the bodies of the frames it routes. Your prompts, tool calls, model outputs, command args — none of it touches D1, R2, or any persistent store on our side.

Concrete consequences:

Disconnect = nothing to delete. When you tap "unpair," the worker broadcasts a workspace_unpaired WS frame; the daemon receives it and tears its connection state down in ~25ms. No retention period. No "your data will be deleted within 30 days." There is no your-data on our side.
Compliance becomes trivial. GDPR data-subject access request? Easy: nothing to return. SOC 2 audit scope? The relay processes ephemeral pub/sub frames; no PII at rest.
No competitive intelligence motive. When you don't store the prompts, you can't be tempted to mine them. Investors hate this take. Users love it.

The one thing we do persist: approval-decision metadata. Timestamp, accepted/rejected, which glob rule matched, who was the actor (for team workspaces). The audit log on your phone is reconstructed from these — never from the prompt body itself.

Implementation note for anyone building similar: the temptation to "just persist the last 50 frames for debugging" is real and you must resist it. The moment you do, you've broken the local-first contract and you can't un-break it without a public commitment that nobody trusts.

The bug that almost killed it

Phase 2 of the build was the LIVE pane — a real-time view of what the agent on your laptop is currently thinking. Channel comes in (Slack message, webhook, scheduled trigger), agent starts streaming reasoning + tool calls, you watch them land on your phone.

It worked beautifully. For one channel.

The moment a second channel was active in the same workspace, the LIVE pane on my phone would freeze. Not crash. Not error. Just... stop streaming. The agent on my laptop was clearly still doing things — daemon logs flying, CPU pinned — but my phone was stuck on the last token from 90 seconds ago.

Where I went looking first (and was wrong):

[Layer]                       [What I assumed]                    [Reality]
───────────────────────────────────────────────────────────────────────────────
WebSocket transport       →   "frames are being dropped"    →    nope, all delivered
Durable Object broadcast  →   "fan-out logic is buggy"      →    nope, every subscriber got every frame
Mobile WS client          →   "Flutter is silently         →    nope, frames received, parsed,
                              swallowing frames"                  emitted to Riverpod stream
Riverpod state            →   "stream got disposed"         →    nope, listeners attached,
                                                                  state class instance alive

I rewrote the broadcast logic three times. I added per-frame tracing at every hop. The frames were arriving. The phone was rendering them as they arrived. They just weren't being generated by the LLM.

The actual bug — one layer up from where I was looking:

The LLM had saturated its context window. Each new channel message in a multi-channel workspace appends to the same dmHistory array on the daemon. Two active channels = two streams of inbound user messages stuffed into the same history buffer. By message ~40, the request payload to the model was approaching the context limit, and the model was timing out on token generation before producing any output. From the LLM's perspective: nothing to send. From my perspective: "the WebSocket pipeline is broken."

The fix was four characters in a config:

channels:
  default:
    dmHistoryLimit: 200   # was unset = unbounded
    historyLimit: 50      # cap per-channel context

Two days. Four characters. A whole layer of the stack I'd been ignoring because I'd assumed the problem was the layer I knew best.

The debugging signal I was missing — the heuristic worth keeping:

When a streaming-LLM frontend looks frozen, the right diagnostic isn't "is the WebSocket healthy?" It's "what's the token-rate-per-second at the model layer right now?" If frames-per-second on the WebSocket is zero AND the model's tokens-per-second is also zero, the LLM is the problem, not the wire. If frames-per-second is zero but the model is producing tokens, then it's the wire.

I now log token-emission rate as a peer signal alongside WS frame rate. Should have done it from day one.

What it does today

Feature surface, in case you want to compare what you'd build if you were solving the same problem:

Live thinking stream — every reasoning step + tool call + decision streamed via WebSocket, no polling
Swipe-to-approve with glob policies (gmail__*, git push *, shell *) — only dangerous moves gate, the rest pass through automatically
Edit-args-before-approve — UI shows the JSON args, you tap a field, edit, then approve; agent re-runs with your edited args (no re-prompt round-trip)
Agent chat from phone — persistent thread with full session context preserved across messages, inline approval cards in-thread when sensitive tools fire
Cron for the agent — schedule, watch, cancel agent runs from anywhere
MCP server / skill / plugin / channel management — install, attach, manage credentials, all from phone
Usage analytics — token totals, per-model cost breakdown, daily bars, hour-of-day heatmap, top sessions, cache split
Multi-machine pairing — laptop + home server + VPS under one workspace; mDNS discovery on LAN, 6-digit code fallback for cellular / headless
Push alerts — the moment OpenClaw needs you, lock-screen approve

Install

If you run a local AI agent and you've ever walked away from your desk wondering what it's doing right now, you can have this set up in about three minutes:

# On the box you want to control:
npm install -g @aerostack/gateway
aerostack init

Then scan the QR or type the 6-digit code on your phone.

iOS: App Store
Android: invite-only while we shake out edge cases — DM if you want in
Daemon (open source, MIT): npmjs.com/package/@aerostack/gateway

Things I haven't shipped yet

In the spirit of #showdev honesty, what's not in v1:

Per-tool latency p95s. I have raw call durations; haven't aggregated.
CSV export for usage. Asked-for, easy, just not prioritized.
LAN-only mode. The relay currently needs internet. Self-hosted relay for fully air-gapped setups is on the roadmap — let me know if that's a hard requirement for you.
Voice approvals ("hey Aero, what's it doing?") — v1.1.
Plugin marketplace UI — exists in CLI; phone surface is browse-only today.

What I'd love feedback on (technical, not business)

Two specific things I'm still wrestling with — would love takes from anyone who's been here:

1. The workspace_unpaired teardown race. When you tap unpair from the phone, both the cloud worker AND the local daemon need to drop the connection. Today: phone calls /api/openclaw/disconnect (web hits worker) AND /api/cli/machines/:id/unpair (daemon listens for the WS frame). Both broadcast the frame; second one is a no-op idempotent. This works but it's two RPCs for one logical action. Has anyone solved this with a single endpoint and clean failure semantics? I keep half-designing it and abandoning the design.

2. WebSocket frame schema versioning. As the daemon ships across 0.24.x versions, the WS frame schema evolves. Right now I'm doing best-effort additive evolution — never remove a field, treat unknowns as optional. But the daemon and the phone version separately, and I can already see the day where a phone running a 30-day-old build will misinterpret a new daemon's frames. Standard answers (versioned envelope, capability negotiation handshake on connect) all add complexity I'd rather not pay until I have to. What's actually working for you?

I'll be in the comments for the next 24-48h replying to everything.

Code & links

Daemon (MIT, open): github → aerostack/gateway
iOS app: App Store
Architecture deep-dive: the design doc lives in the daemon repo at docs/AEROSTACK_OPENCLAW_BRIDGE_ARCHITECTURE.md
Original IH post (the build-story version): [link after IH ships]

If you build something interesting on top of the daemon, send it — I want to learn from how people use this in ways I didn't design for.

DEV Community