That's not a bug. It's a missing owner.
The user closed the browser 30 seconds ago.
Your logs show the response was never delivered. But the LLM stream is still running. The vector search is still scanning. The reranker is still scoring. The tool calls are still executing.
The invoice will arrive tomorrow.
This is not hypothetical. It is the default behaviour of many AI backends today.
app.post("/chat", async (req, res) => {
const stream = await openai.chat.completions.create({
model: "gpt-4.1",
stream: true,
messages: req.body.messages,
});
for await (const chunk of stream) {
res.write(chunk.choices[0]?.delta?.content ?? "");
}
res.end();
});
That code looks reasonable.
It even works — until the user refreshes the page, closes the tab, loses signal, or navigates away.
At that moment:
- the HTTP response is dead
- the client no longer exists
- the user no longer cares
But the async work continues anyway.
The LLM keeps generating tokens. The vector search keeps scanning. Background tasks run with no remaining consumer.
The work outlived the reason it existed.
That sentence is the real problem.
The real root cause: ownership is missing
Most async systems treat cancellation as an optional convention rather than a runtime guarantee.
You can pass an AbortController if you remember.
You can manually wire cleanup if every developer remembers.
You can hope every dependency correctly propagates cancellation.
But structurally, there is no single owner for the tree of async work created by a request.
A single AI request may spawn:
- LLM streaming
- vector search
- reranking
- tool execution
- background audit writes
- metrics
- cleanup handlers
- retries
- observability traces
Every one of these should stop when the user disconnects.
But native Promise does not provide ownership semantics.
So the work survives.
The hidden cost of abandoned AI work
At small scale this is invisible.
At production scale it is expensive.
100,000 abandoned requests/day
× 3–5 seconds of unnecessary downstream execution
= millions of wasted tokens
= unnecessary GPU time
= avoidable API spend
= infrastructure pressure
This is not a code-quality issue.
This is infrastructure waste.
The fix: one scope owns the work
@workit/core introduces one ownership boundary: the scope. Every child task belongs to it. When the scope cancels, everything below it stops with a typed reason and runs registered cleanup before the scope resolves.
import { run } from "@workit/core";
app.post("/chat", async (req, res) => {
await run.scope(async (scope) => {
// Client disconnects → cancel everything underneath
req.signal.addEventListener("abort", () =>
scope.cancel({ kind: "manual", tag: "client_disconnected" })
);
const llm = scope.spawn(async (ctx) =>
streamLLM(req.body.messages, { signal: ctx.signal }),
{ name: "llm-stream", kind: "llm" });
const tools = scope.spawn(async (ctx) =>
runTools(req.body.input, { signal: ctx.signal }),
{ name: "tools", kind: "tool" });
const vector = scope.spawn(async (ctx) =>
searchVectorDB(req.body.query, { signal: ctx.signal }),
{ name: "vector-search", kind: "io" });
const [text, toolResult, sources] = await Promise.all([llm, tools, vector]);
res.json({ text, toolResult, sources });
});
});
Now the request owns the work.
When the client disconnects:
-
scope.cancel({ kind: "manual", tag: "client_disconnected" })fires. - Every child task receives
ctx.signal.abortedwith the typed reason. - The OpenAI stream call sees its
signalabort and stops at the TCP layer. - Tool execution stops.
- Vector search stops.
- Registered
ctx.defer(...)cleanup runs LIFO. - The scope resolves deterministically.
No orphaned work. No zombie tasks. No invisible token burn.
streamLLM,runTools, andsearchVectorDBabove are your application's wrappers — anything that accepts anAbortSignal(the official OpenAI SDK, the Vercel AI SDK,pg,mongodb,fetch) plugs straight in.
This is not theory
The repository ships a runnable sample for exactly this scenario:
// samples/stt-disconnect.sample.js — same shape, applied to live audio
const disconnect = new AbortController();
const iterator = transcribeStream(microphone, {
async transcribe(chunk, ctx) {
return provider.transcribe(chunk, { signal: ctx.signal });
},
}, { signal: disconnect.signal })[Symbol.asyncIterator]();
await iterator.next(); // first chunk: "FIRST"
const pending = iterator.next(); // second chunk starts
disconnect.abort(new CancellationError({ kind: "manual", tag: "client_disconnect" }));
When the sample runs, it asserts and prints:
{
"sample": "stt-disconnect",
"first": "FIRST",
"providerCancelled": true,
"sourceClosed": true,
"reasonKind": "manual"
}
The receipt is what did not continue running:
-
providerCancelled: true— the provider's HTTP request observed the abort. -
sourceClosed: true— the async generator'sfinallyran and the microphone source closed. -
reasonKind: "manual"— the cancel reason was typed end to end. Your dashboard can pivot a metric on it.
Reproduce: npm run sample:stt-disconnect. The sample is at samples/stt-disconnect.sample.js.
Why this matters
The AI ecosystem is rapidly moving toward:
- streaming
- agents
- tool execution
- multi-provider inference
- background workflows
- long-lived realtime sessions
All of these create trees of async work. Most of them still lack clear ownership semantics.
The result is:
- abandoned compute
- leaking streams
- runaway retries
- zombie tool execution
- incomplete shutdowns
- hidden infrastructure cost
WorkIt treats async work as something that must have an owner.
When the owner disappears, the work disappears with it.
The deeper idea
This is not really about cancellation.
It is about lifecycle ownership.
The request should own the work it creates.
The WebSocket should own its subscriptions.
The agent should own its tools.
The stream should own its producers.
Without ownership, async systems slowly leak compute and complexity.
Try it
npm install @workit/core
import { run } from "@workit/core";
await run.scope(async (scope) => {
req.signal.addEventListener("abort", () =>
scope.cancel({ kind: "manual", tag: "client_disconnected" })
);
const result = await scope.spawn(async (ctx) =>
openai.chat.completions.create({
model: "gpt-4.1",
stream: true,
messages,
signal: ctx.signal, // pass ctx.signal — the chain handles the rest
}),
{ name: "ai-call", kind: "llm" }
);
});
That's the contract. One scope owns the work. The OpenAI client receives the signal. The tab closes; the bill stops.
The larger problem
Every senior engineer has seen some version of this question in production:
"Why is this still running?"
That question appears in:
- AI streaming
- WebSockets
- Kafka consumers
- background workers
- multiplayer game loops
- Discord bots
- server-side rendering
- agent runtimes
The problem is the same every time:
The work outlived the reason it existed.
WorkIt is an attempt to fix that at the runtime level.
The series
- Owned Async Work in TypeScript — Promise.race does not cancel your work
- You are here — Your AI is still billing after the user closed the tab
- Nine composables. One ownership contract. — coming this week
- AbortController cannot preempt a CPU loop. WorkIt uses a worker boundary. — coming this week
- A 1,000,000,000-row pipeline. 25 consumed. The producer noticed. — coming
- A 50¢ agent. A connection that closes on Ctrl-C. — coming
- 100K agent runs a day. Bounded observability cost without core bloat. — coming
- An agent loop in 12 lines. A typed tool contract. A 50¢ ceiling. — coming
Sources and reproducibility
- GitHub: github.com/WorkRuntime/workit
-
The disconnect sample:
samples/stt-disconnect.sample.js -
The bench suite:
node benchmarks/articles/run-all.mjsreportspassed: 19, failed: 0 -
The production gate:
npm run verify— typecheck, 214 unit tests at 100% coverage, no-network gate, vulnerability audit, SBOM, bundle gate, soak, exporter stress, package-consumer fixtures across runtimes
Top comments (0)