I saw a video today that made me laugh, then made me a bit worried.
It was one of those jokes that is not really a joke because you can already se...
For further actions, you may consider blocking this person and/or reporting abuse
You totally right as summarized:
This resonates deeply with my recent piece on 'The Accountant'—where the goal isn't just to spend less, but to optimize for Insight-per-Dollar.
I love your breakdown of the token ratio (Large Input/Small Output). It reframes the AI's job from 'Generator' to 'Router/Filter.' When we 'Think Smaller,' we aren't just being cheap; we're being responsible. We’re moving complexity out of the prompt (where it breeds hallucinations) and into the architecture (where we can audit it). The 28-field JSON monster is the ultimate anti-pattern for production reliability.
Really well-articulated, especially the framing around “cognitive surface” instead of raw prompt size.
What stands out to me is that most LLM systems don’t fail because models are weak, they fail because we keep collapsing multiple reasoning steps into one opaque call. That’s where complexity quietly accumulates.
The real shift is moving from “one big intelligent prompt” to clear, minimal decision units stitched together as a system.
Love the 'haunted spreadsheet' line — that's exactly how it feels debugging a 20‑field JSON prompt at 1am. We've started splitting tasks into smaller cognitive steps too, and honestly the biggest win wasn't even token cost. It was finally being able to tell which part of the pipeline broke.
Exactly! Did you find that this approach create more coherent relationship between input and output. Or this is just a ghost I'm hunting??? XD.
managing agent tasks like sprints changed how I think about this. if the input scope is unclear, you get 3,000 tokens of confident-sounding garbage. the constraint isn't compute cost, it's task definition quality.
The pricing implication of this is uncomfortable. If you bill on tokens, you reward the workflows Marco's calling out as broken. The team that decomposes tasks looks like a light user. The team that throws everything at one giant prompt looks like a power user. Token-based pricing accidentally subsidizes architectural laziness. At tiun we keep running into this when teams ask how to price AI features - usage-based is the default answer, but usage of what is the actual question.
The framing of token ratios as diagnostic signals rather than cost metrics is the right abstraction. Measuring input/output balance per cognitive operation reveals design problems that raw spending dashboards hide completely.
One extension worth considering: when you decompose a monolithic prompt into atomic tasks (classification, extraction, validation), each sub-task produces its own token trace. Those traces become an audit trail of how the system reasoned. If a pipeline returns a wrong classification, you can inspect which atomic step drifted instead of staring at one giant prompt wondering where it went sideways. The token trace is both the cost signal and the debugging signal.
Thank you
Agree on "thinking smaller" but the cost surface I learned the hard way is the prefix, not the per-call tokens. Every separate Read/Grep round-trip re-pays the cached prefix at 10% — small individually, brutal at agent-loop scale. We collapsed N round-trips into one batched Bash call (./supertool 'op' 'op' 'op') and the per-task cost dropped ~40% on read-heavy work. Same model, same prompts — just fewer trips through the cache. The bug it took us a while to see: "low per-call cost" hid the real bill. Open source: github.com/Digital-Process-Tools/claude-supertool.
One side effect of your approach you're not claiming: decomposed sub-prompts naturally end up with a stable cacheable prefix (shared system + context) and a task-specific tail. That hits prompt caching on Anthropic and OpenAI without designing for it. The cost reduction you set aside shows up anyway, as a byproduct of the cognitive surface argument.
Agree on thinking smaller, but the cost surface I learned the hard way is the prefix, not the per-call tokens. Every separate Read/Grep round-trip re-pays the cached prefix at 10% — small individually, brutal at agent-loop scale. We collapsed N round-trips into one batched Bash call and the per-task cost dropped ~40% on read-heavy work. Same model, same prompts — just fewer trips through the cache. The bug it took us a while to see: low per-call cost hid the real bill. Open source: github.com/Digital-Process-Tools/claude-supertool.
Ha, just published today dev.to/dmitryame/the-token-tax-why...
It's all about minimalistic architecture -- more than ever!