DEV Community

Cover image for Your AI Is Still Billing After the User Closed the Tab
AdmilsonCossa
AdmilsonCossa

Posted on • Edited on

Your AI Is Still Billing After the User Closed the Tab

That's not a bug. It's a missing owner.

The user closed the browser 30 seconds ago.

Your logs show the response was never delivered. But the LLM stream is still running. The vector search is still scanning. The reranker is still scoring. The tool calls are still executing.

The invoice will arrive tomorrow.

This is not hypothetical. It is the default behaviour of many AI backends today.

app.post("/chat", async (req, res) => {
  const stream = await openai.chat.completions.create({
    model: "gpt-4.1",
    stream: true,
    messages: req.body.messages,
  });

  for await (const chunk of stream) {
    res.write(chunk.choices[0]?.delta?.content ?? "");
  }

  res.end();
});
Enter fullscreen mode Exit fullscreen mode

That code looks reasonable.

It even works — until the user refreshes the page, closes the tab, loses signal, or navigates away.

At that moment:

  • the HTTP response is dead
  • the client no longer exists
  • the user no longer cares

But the async work continues anyway.

The LLM keeps generating tokens. The vector search keeps scanning. Background tasks run with no remaining consumer.

The work outlived the reason it existed.

That sentence is the real problem.


The real root cause: ownership is missing

Most async systems treat cancellation as an optional convention rather than a runtime guarantee.

You can pass an AbortController if you remember.

You can manually wire cleanup if every developer remembers.

You can hope every dependency correctly propagates cancellation.

But structurally, there is no single owner for the tree of async work created by a request.

A single AI request may spawn:

  • LLM streaming
  • vector search
  • reranking
  • tool execution
  • background audit writes
  • metrics
  • cleanup handlers
  • retries
  • observability traces

Every one of these should stop when the user disconnects.

But native Promise does not provide ownership semantics.

So the work survives.


The hidden cost of abandoned AI work

At small scale this is invisible.

At production scale it is expensive.

100,000 abandoned requests/day
× 3–5 seconds of unnecessary downstream execution
= millions of wasted tokens
= unnecessary GPU time
= avoidable API spend
= infrastructure pressure
Enter fullscreen mode Exit fullscreen mode

This is not a code-quality issue.

This is infrastructure waste.


The fix: one scope owns the work

@workit/core introduces one ownership boundary: the scope. Every child task belongs to it. When the scope cancels, everything below it stops with a typed reason and runs registered cleanup before the scope resolves.

import { run } from "@workit/core";

app.post("/chat", async (req, res) => {
  await run.scope(async (scope) => {
    // Client disconnects → cancel everything underneath
    req.signal.addEventListener("abort", () =>
      scope.cancel({ kind: "manual", tag: "client_disconnected" })
    );

    const llm = scope.spawn(async (ctx) =>
      streamLLM(req.body.messages, { signal: ctx.signal }),
      { name: "llm-stream", kind: "llm" });

    const tools = scope.spawn(async (ctx) =>
      runTools(req.body.input, { signal: ctx.signal }),
      { name: "tools", kind: "tool" });

    const vector = scope.spawn(async (ctx) =>
      searchVectorDB(req.body.query, { signal: ctx.signal }),
      { name: "vector-search", kind: "io" });

    const [text, toolResult, sources] = await Promise.all([llm, tools, vector]);
    res.json({ text, toolResult, sources });
  });
});
Enter fullscreen mode Exit fullscreen mode

Now the request owns the work.

When the client disconnects:

  1. scope.cancel({ kind: "manual", tag: "client_disconnected" }) fires.
  2. Every child task receives ctx.signal.aborted with the typed reason.
  3. The OpenAI stream call sees its signal abort and stops at the TCP layer.
  4. Tool execution stops.
  5. Vector search stops.
  6. Registered ctx.defer(...) cleanup runs LIFO.
  7. The scope resolves deterministically.

No orphaned work. No zombie tasks. No invisible token burn.

streamLLM, runTools, and searchVectorDB above are your application's wrappers — anything that accepts an AbortSignal (the official OpenAI SDK, the Vercel AI SDK, pg, mongodb, fetch) plugs straight in.


This is not theory

The repository ships a runnable sample for exactly this scenario:

// samples/stt-disconnect.sample.js — same shape, applied to live audio
const disconnect = new AbortController();
const iterator = transcribeStream(microphone, {
  async transcribe(chunk, ctx) {
    return provider.transcribe(chunk, { signal: ctx.signal });
  },
}, { signal: disconnect.signal })[Symbol.asyncIterator]();

await iterator.next();                // first chunk: "FIRST"
const pending = iterator.next();      // second chunk starts
disconnect.abort(new CancellationError({ kind: "manual", tag: "client_disconnect" }));
Enter fullscreen mode Exit fullscreen mode

When the sample runs, it asserts and prints:

{
  "sample": "stt-disconnect",
  "first": "FIRST",
  "providerCancelled": true,
  "sourceClosed": true,
  "reasonKind": "manual"
}
Enter fullscreen mode Exit fullscreen mode

The receipt is what did not continue running:

  • providerCancelled: true — the provider's HTTP request observed the abort.
  • sourceClosed: true — the async generator's finally ran and the microphone source closed.
  • reasonKind: "manual" — the cancel reason was typed end to end. Your dashboard can pivot a metric on it.

Reproduce: npm run sample:stt-disconnect. The sample is at samples/stt-disconnect.sample.js.


Why this matters

The AI ecosystem is rapidly moving toward:

  • streaming
  • agents
  • tool execution
  • multi-provider inference
  • background workflows
  • long-lived realtime sessions

All of these create trees of async work. Most of them still lack clear ownership semantics.

The result is:

  • abandoned compute
  • leaking streams
  • runaway retries
  • zombie tool execution
  • incomplete shutdowns
  • hidden infrastructure cost

WorkIt treats async work as something that must have an owner.

When the owner disappears, the work disappears with it.


The deeper idea

This is not really about cancellation.

It is about lifecycle ownership.

The request should own the work it creates.

The WebSocket should own its subscriptions.

The agent should own its tools.

The stream should own its producers.

Without ownership, async systems slowly leak compute and complexity.


Try it

npm install @workit/core
Enter fullscreen mode Exit fullscreen mode
import { run } from "@workit/core";

await run.scope(async (scope) => {
  req.signal.addEventListener("abort", () =>
    scope.cancel({ kind: "manual", tag: "client_disconnected" })
  );

  const result = await scope.spawn(async (ctx) =>
    openai.chat.completions.create({
      model: "gpt-4.1",
      stream: true,
      messages,
      signal: ctx.signal,            // pass ctx.signal — the chain handles the rest
    }),
    { name: "ai-call", kind: "llm" }
  );
});
Enter fullscreen mode Exit fullscreen mode

That's the contract. One scope owns the work. The OpenAI client receives the signal. The tab closes; the bill stops.


The larger problem

Every senior engineer has seen some version of this question in production:

"Why is this still running?"

That question appears in:

  • AI streaming
  • WebSockets
  • Kafka consumers
  • background workers
  • multiplayer game loops
  • Discord bots
  • server-side rendering
  • agent runtimes

The problem is the same every time:

The work outlived the reason it existed.

WorkIt is an attempt to fix that at the runtime level.


The series

  1. Owned Async Work in TypeScriptPromise.race does not cancel your work
  2. You are hereYour AI is still billing after the user closed the tab
  3. Nine composables. One ownership contract. — coming this week
  4. AbortController cannot preempt a CPU loop. WorkIt uses a worker boundary. — coming this week
  5. A 1,000,000,000-row pipeline. 25 consumed. The producer noticed. — coming
  6. A 50¢ agent. A connection that closes on Ctrl-C. — coming
  7. 100K agent runs a day. Bounded observability cost without core bloat. — coming
  8. An agent loop in 12 lines. A typed tool contract. A 50¢ ceiling. — coming

Sources and reproducibility

  • GitHub: github.com/WorkRuntime/workit
  • The disconnect sample: samples/stt-disconnect.sample.js
  • The bench suite: node benchmarks/articles/run-all.mjs reports passed: 19, failed: 0
  • The production gate: npm run verify — typecheck, 214 unit tests at 100% coverage, no-network gate, vulnerability audit, SBOM, bundle gate, soak, exporter stress, package-consumer fixtures across runtimes

Top comments (0)