A Spent $5,000 On Tokens; So That You Don't Have To (Part 2)

Someone left an upvote, so there seems to be one listener. Dear listener, let me continue...

First let me state my specific claims:

LLMs and GenAI and Agents are being whoppingly oversold.
The challenges are baked into the tech and are not “fixable” with more scale or more data or more time or more money.
As a “augmented intelligence” algorithm, in expert hands, in some narrow fields, they are amazing.
Yet not in an anyway that justifies the hype “AI will replace all human work” on any timeline with any of the current LLM technologies.
I know nothing about creative job or any that isn't what I do; I am talking about what I do which is sling code at scale in finance.
I am not claiming it LLM/GenAI has no utility, I am claiming it will be net negative return on investment for the majority of enterprises that try to deploy it as currently oversold.

The specific claim I refute is that LLM/GenAI to be “replacing junior developerS” in a way that is not gaming the benchmark, or cooking the books, or roll the poop in glitter, or to make a “we do AI too!” press release.

When someone is having a hard time getting great results, you are going to hear the "No true Scotsman" or appeal to purity.

The pundits will say that if I just had better tests, or better Project Specification Document (PRD) docs, or used the better brand Agentic Tool, or the other labs models, or just learnt to prompt-engineer better, I would have got better results.

Okay, well, on my last project, I used all of these US-based tools on a single client project:

Claude Code CLI
OpenAI Codex CLI
GitHub Copilot CLI
Cursor Agent CLI
OpenCode CLI
Copilot IDE
OpenCode Desktop
Codex Desktop
Aider Chat

I like working with multiple models and some features of each tool. I have patched codex to add in both Claude and Gemini. I then added back the 'Ask' read-only mode, as I find it a good emergency feature when a model has gone a bit rogue.

You might think, "Gee, if the guy had just stuck to one tool to learn how to use it properly, maybe he could have got it to work!" At one point, I would burn through a $200 Max sub in the first week of the month. The new 5-hour token limits mean that to work a full day, I need two Max subs. That is why I needed all the tools to have enough subsidised Max subs to get through the month.

I now avoid the least reliable tool, Claude Code, until I have hit the weekly rate limits of the other tools. Yes, you read that correctly. I would rather use any of the other tools before Claude Code. Once again, not because I am unfamiliar with it. Because I have used it the most. I know from personal experience that I get better results when mixing Claude with other models across different tools. If you are not using Claude with GPT and others with something like Cursor Agent CLI or OpenCode, then you are missing out.

Surely you cannot prefer OpenAI Codex, I can hear you cry. Well, as I said, I have patched OpenAI Codex CLI, which is Apache-2.0 open-source, to run Claude Opus 4.6 (not 4.7!), and Gemini 3 Pro, next to GPT-5.4 (not 5.5!). Why? As you can spawn agents like tmux session, and go work with each of them in parallel!

I can have the Main agent pass work to named agents. Yes, Claude Code has subagents, as does OpenCode. Yet neither lets you switch between them to directly work with each of them. Yet I can go back to asking the Main agent to delegate work.

Remember, I patched in Claude, Gemini, and GPT into my build. So, I switch between them during an agent session to get them to code-review or pair-program.

My current preferred setup is to have the Main model with two or three subagents. The less reliable latest models can be the managing Main model as well, management isn't coding, is it? lolz.

I don't let the latest Opus or GPT write code, as they are far too erratic. At my setup, they have been promoted out of the way to be product managers. Lolz.

No one can say that I am simply not experienced with this tech. On the contrary, I have not heard of anyone who has used as many tools as aggressively as I have to try to get things to work without constant direct supervision.

That is enough for now. I need some sleep. I am due a catch-up.

End.

DEV Community

A Spent $5,000 On Tokens; So That You Don't Have To (Part 2)

Top comments (0)