Stop Paying by the Syllable

#aipricing #agentengineering #aicoding #claudecode

Open Anthropic's pricing page — or OpenAI's, or any other major frontier provider's — and the unit you are charged in is the token. A token is a sub-word linguistic chunk, about four characters in English, give or take. For practical purposes, you are paying by the syllable.

This is the default unit of charge across the industry. It is also a strange unit. The thing you are actually trying to buy is a solved problem — a fixed migration, a generated invoice, a working summary, a correctly-typed SQL query. What gets metered is the verbal output the model produces along the way.

A real example

Ask a frontier model to find the bug in a 600-line migration. With extended reasoning enabled, it generates roughly 8,000 tokens of internal deliberation and outputs 200 tokens of fix. The bill is 8,200 tokens. The work delivered is the fix. The other 8,000 tokens are the model thinking on the way to the answer.

Token pricing meters the model's verbal output along the way. It does not meter whether the work got done. The customer absorbs the difference between those two quantities every time the system runs.

This was always pulp-era pricing

Knowledge work used to be priced by the word. Dime-store novelists in the early twentieth century were paid per published word. So were many journalists. The model produced a specific kind of distortion: bloat, padding, descriptive passages that earned the writer more because they were longer. "He replied in the negative" beat "no." The pulp era is remembered, partly, as the period where prose got paid by length and language got worse.

Knowledge work moved off this pricing over the next century. Journalism moved to staff salaries plus per-piece. Fiction moved to advances plus royalties. Law moved to hourly plus retainers. Copywriting moved to project rates. Even content marketing today is mostly priced per-piece — except in the SEO content-farm space, which is the modern equivalent of pulp.

The AI industry did not inherit any of these models. It shipped its first APIs with token pricing because the token is what costs the provider to serve. The compute cost is roughly linear in tokens, so the bill is linear in tokens, so the customer absorbs the variance in linguistic length. The unit got picked from the provider's accounting and applied to the customer's value calculation without anybody noticing the substitution.

The structural distortion

The pricing model creates a tax on the practices that produce reliable agent systems.

Last week I argued that tests are the substrate underneath dependable AI — schema checks, assertions on the model's output, evals gating deploys. Every assertion is a model call. Every eval run is a stack of tokens. The team that wires in disciplined supervision is the team paying the highest token bill.

Chain-of-thought reasoning produces better answers on hard problems. It also produces an order of magnitude more output tokens than terse responses. Using it correctly costs more. Skipping it to save tokens ships worse decisions.

The same applies to dry-runs, blast-radius checks, proof chains, retrieval-augmented context — every artifact the supervision argument has been advocating for. They all cost tokens. The pricing model taxes the practice. Whatever you build, you build against an economic gradient that pulls toward fewer assertions, shorter prompts, less verification.

The technical work of making agents reliable is on one side of the gradient. The unit of charge is on the other.

Why outcome pricing is hard

The natural reply is: price by outcome, not by token. That is the right direction, and the reasons it has not happened yet are real.

Outcome pricing requires a contract on what counts as a successful outcome. For varied agent work — research, code, design, customer support — the acceptance criteria differ per task. Providers do not want to absorb the variance of "did the customer think this was good." Customers do not want to spend up-front time defining acceptance criteria for every call.

The provider charges in tokens because tokens are what cost them to serve. The customer pays in tokens because that is what is offered. Whether the work got done sits between them with neither party on the hook.

This arrangement is not stable.

Where the market is actually moving

The most interesting pricing experiments are at the edges. Anthropic offers flat-rate Claude Code subscription tiers, with Max in particular sitting above the per-token consumption model and effectively capping the variable cost. GitHub Copilot has been per-seat from the beginning, with no per-token surcharge for the user. Cursor uses tiered per-seat plans with consumption guardrails. Cognition's Devin charges in Agent Compute Units — a consumption budget abstracted above tokens, getting closer to "per task" without yet being "per outcome."

None of these are outcome pricing. They are halfway houses. The direction is away from the syllable. The endpoint, in the time it takes the industry to figure out the contract, is some version of pay-for-task — task defined narrowly enough that the provider can take the variance risk, broadly enough that the customer can predict the bill.

Whoever figures out that contract first gets a structural advantage in the agent market.

Where this leaves the practitioner

If you are running a serious AI stack today, the syllable tax is something you absorb whether you notice it or not. The practical move is not to retreat from the practices that cost tokens. The supervisory artifacts — the tests, the evals, the dry-runs, the proof chains — pay for themselves in incidents avoided. They are the right engineering even when they are the wrong economics.

The harder move is to keep an eye on what the pricing model is selecting against. Every team I have seen that built disciplined supervision on top of a per-token API also built ongoing pressure to skip the supervision when the token bill came in. The first time you trim an eval because it costs too much to run, you have started letting the pricing model design your agent.

Build the artifacts anyway. Notice when the gradient is pulling against you. Push for pricing models that charge for something closer to the work being done.

The closing claim

The syllable will not be the unit of charge for AI in five years. It is too misaligned with the unit of value, and the misalignment is too visible — the prompt-engineering for terseness, the underinvestment in supervision, the constant tension between "use the model well" and "keep the token bill manageable." The economics are pulp-era for the same reasons pulp prose was pulp-era. The market has already moved off pricing of this shape for every other kind of knowledge work, in every prior generation. It will move off this one too.

A year from now, the most interesting piece in this space will be about whichever provider figured out the per-task contract first. The current one is structurally backwards.

Pick the practice first. The pricing model will catch up or get competed away.