The afternoon I learned what my AI subscription was actually doing, and the 200 lines that took my next bill down 41 percent.
I had been using Cursor for six months when I noticed the discrepancy. I was renaming a function. A short one. Three lines of body. One call site. The kind of refactor that takes, at most, six seconds of human attention.
I had two windows open. The Cursor chat panel where I had typed "rename getUser to fetchUser". And the Anthropic console in another tab, because I had been debugging a different project earlier and forgot to close it.
The Anthropic console refreshed while the Cursor request was in flight. I watched the token counter tick up live. The number it landed on for that single rename request was 8,400 input tokens. The actual prompt I had typed was eleven words.
I sat there for a moment. Then I opened a fresh terminal and made the same call directly through the Anthropic API with my own minimal prompt. Same model. Same intent. Same outcome.
The direct call used 1,900 input tokens. Cursor had sent 6,500 extra tokens of context to perform the same rename.
That observation was the start of the spreadsheet that ate the next four hours of my evening.
What was in those 6,500 tokens
I do not have inside knowledge of Cursor's internals. I have my own experiments and the public documentation, neither of which gives a complete picture. Here is what I have: when I ran the same prompt 50 times across different parts of my codebase, the input token count varied between roughly 4,000 and 14,000 with a median around 8,000. The variation correlated loosely with how many open buffers I had and how recently I had viewed files in the same directory.
The reasonable inference is that Cursor was sending me a system prompt, plus indexing context derived from my recent activity, plus tool definitions for the agent framework, plus the actual prompt I had typed. The first three are the routing layer doing its job. The fourth was the only one I could see in the chat panel.
Some of that context helps. When I ask for a refactor that touches several files, Cursor knowing about those files is the entire point. When I ask to rename a function whose call sites fit in three lines of context, sending 6,500 tokens of unrelated buffer state is the routing layer playing it safe on my behalf in a way that benefits Cursor more than it benefits me.
Cursor charges a fixed seat fee. Anthropic charges Cursor by the token. The math runs the wrong direction for me as a heavy user, because the marginal token cost ends up in my own direct API calls (which Cursor's seat does not cover) plus the fixed seat itself. Conservative context is cheap to ship and expensive to consume. The incentive lands on the wrong side of the table.
The bill that made me pay attention
My March bill arrived on April 2. The Anthropic line item had grown 50 percent month over month for three months running. Cursor at $20 was flat. Copilot at $10 was flat. The variable line was the API I was hitting from my own CLI for things Cursor was not the right tool for.
The growth was not from doing more work. I checked. My weekly logged hours were stable. The growth was from an increasing fraction of those hours involving AI calls that I was making more casually because the AI was getting more useful.
The trend was straight. If I extrapolated, by August I was going to be paying more in direct Anthropic API spend than in Cursor seats, and my Cursor seat would still be running the same conservative context overhead on every chat panel turn. The bill was going to keep growing in two places at once.
I cancelled Cursor that weekend.
The 200 lines
The thing I built to replace the routing piece of Cursor was small enough to embarrass me for not having built it months earlier. A regex based intent classifier with five rules. Trivial prompts route to Haiku. Code prompts route to Sonnet. Planning prompts route to Opus. Embedding-style classification prompts route to a cheap OpenAI model. Default to Sonnet if nothing matches.
That is the entire routing logic. Two hundred lines of TypeScript including imports, error handling, a pricing table, and a cost calculator that logs every call. The full file fits on one screen if you have a tall display.
I tested it on a hundred prompts I had logged from the previous week. The breakdown shifted hard. Sonnet went from 70 percent of calls to 25 percent. Haiku went from zero to 60 percent. Opus stayed at 5 percent. The estimated cost reduction was 47 percent on the test set.
I did not believe the number. I assumed I had a bug. I instrumented the router to log the actual model picked per prompt and the cost in real time, and I ran my normal workflow for two weeks. The actual reduction came in at 41 percent on the May 2 bill, with 30 percent more total calls because the cheaper per call cost made me reach for AI more often.
What I did not understand before
The routing layer is the most valuable part of the AI tool stack right now, and the wrappers want to own that part most of all.
Every coding tool I have looked at in the last 90 days has shipped a model dropdown. Cursor added one. GitHub Copilot added one. Windsurf added one. The story they tell is customer choice. The story underneath, I think, is that they have all noticed the same thing I noticed in April. The user can route their own calls. The user is starting to. If the wrappers do not own the routing layer, they own a chat panel and an autocomplete and not much else.
The chat panel is real value. The autocomplete is real value. They are not $20 a month of value for a heavy user who would prefer to route directly. They are maybe $5 to $10 a month of value, sized to the actual work they save.
I think we are two quarters away from a wave of users doing this exercise. The wrappers know that. The pricing pages are starting to reflect it.
What I would tell my March self
Three things, in the order they matter.
The first: open the Anthropic dashboard. Look at the input token count on three of your normal Cursor turns. If the number runs more than 3x your direct call baseline, the routing layer is not earning its cost on your usage pattern. That does not mean cancel. That means notice.
The second: log every AI call you make for one week. Cost per call, model picked, prompt length, output length. The log takes twenty lines of code per provider. The data will surprise you. No honest way exists to optimize a bill you cannot see.
The third: write the router. Two hundred lines. The first version does not have to be smart. Five regex rules for intent capture 70 percent of the savings. Iterate on the rules later.
The reason I would tell my March self these things: I would have done the same exercise three months earlier and saved roughly $300 in subscription overlap. The cost of doing the exercise: one Saturday afternoon. The cost of not doing it: whatever your bill grows to in the next quarter, which for most people building with AI right now lands at a number bigger than they want to admit.
What this does not solve
The router does not give me a multi file editing agent. It does not index my codebase. It does not know about my open buffers. It does not autocomplete inline. None of that is its job.
I kept Copilot for the inline ghost text in VS Code, because that is a different product solving a different problem and the $10 is not the line that hurts. For the multi file agent work I would have used Cursor for, I now use Claude Code from the terminal, which I pay for separately through my Max plan. The total stack is cheaper than Cursor plus my old direct API spend.
If your usage pattern is different, your math will be different. If you live in the chat panel and rarely go outside it, Cursor is probably still a fair trade. If your AI work spans chat, agent loops, embedding pipelines, and one off CLI calls, the routing layer is the one piece worth owning yourself.
The closing
The bill came in on April 2. The new bill came in on May 2. The difference between them was 41 percent and 200 lines of code and one weekend afternoon I was going to spend half asleep in front of a movie.
The lesson should have been obvious. The wrappers have an incentive to send more tokens than necessary. The user has an incentive to send fewer. The routing layer is where the two incentives meet. Whoever owns the routing layer wins.
The routing layer can be 200 lines of yours.
What is the line on your AI bill that grew the fastest last month? Drop it in a reply. I read everything.
Written by **GDS K S* (thegdsks.com), building Glincker.*
If this was useful, follow me on X / @thegdsks. I write about the parts of the AI stack vendors keep off the pricing page.
Top comments (1)
I laughed because I’ve felt that same silent budget hemorrhage. Thank you for ripping open the token details—it’s a warning everyone using AI coding tools needs. Most devs don’t realize the context re‑sent includes huge chunks of unrelated code. Can we teach such tools to send only the AST of what actually changed? If not, maybe we need a local proxy that slims the payload before it ever hits the network.
Imagine a “token budget” mode hard‑coded into the editor. If the projected context exceeds your cap, it simply refuses the request. That would force more modular coding habits from the start.