Takayuki Kawazoe

Posted on May 12

"Cutting MCP token bloat by 12x: what happened when we packed 31 tools into one server"

#mcp #claude #python #architecture

Earlier this week @akshay_pachaar summarized a year of MCP-vs-CLI arguing into one sharp line:

"The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: Playwright MCP eats 13.7K tokens, Chrome DevTools MCP eats 18K. A 5-server setup burns 55K tokens before any work."

He is right. Those numbers are the steady drumbeat against MCP as a delivery format. If your agent burns 55K tokens just advertising capabilities, the protocol starts to look like a tax.

We just shipped a counter-data point. codens-mcp is a single Python package that exposes 31 tools across five products (Purple, Red, Blue, Green, Auth, plus a cross-product registration tool). I sat down with wc -c and a calculator and got a number I had to triple-check: the entire tool surface, descriptions and all, is ~4,720 tokens. That is roughly 12x less than the 5-server number in the tweet, and about 3x less than Playwright MCP alone.

This is not a "look how clever we are" post. It is the boring engineering answer: most of MCP's token cost is not the protocol, it is the loading strategy. Below I walk through how we measured it, the five architecture decisions that made the number small, and the real tradeoffs we ate to get there.

The measurement

Here is the actual byte count from the tool definition files, straight off disk:

auth_tools.py     1,555 chars
blue_tools.py     2,576 chars
cross_tools.py    3,913 chars
green_tools.py    6,160 chars
purple_tools.py   1,448 chars   # re-exports 16 tools from purple-codens-mcp
red_tools.py      3,231 chars
                 ───────
total            18,883 chars  ≈ 4,720 tokens

The 4 chars/token heuristic is a known underestimate for natural-language English (3.5 is closer to GPT/Claude tokenizers in practice), but it is fine as an upper-bound on a registration payload that contains a mix of Python identifiers, docstrings, and JSON-schema-ish hints. The MCP server sends a slightly inflated version of these definitions over the wire as tool descriptors, so the on-context cost the model sees is in the same order of magnitude. I have done the apples-to-apples comparison with tiktoken on the rendered descriptors and the number lands between 4.4K and 5.1K depending on whether you count the JSON schema framing. ~4,720 is the honest middle.

The 31 tools break down like this:

Purple (16, re-exported from purple-codens-mcp): purple_login, purple_whoami, purple_analyze_repo, purple_register_project, and twelve more covering projects, repos, instructions, workflows, and SSE.
Red (4): red_create_bug_report, red_get_bug_report, red_analyze_bug_report, red_submit_bug_fix_plan_to_purple.
Blue (4): blue_list_e2e_tests, blue_generate_e2e_test, blue_run_e2e_test, blue_get_e2e_test_results.
Green (4): green_create_consultation_with_message, green_send_consultation_message, green_convert_consultation_to_prd, green_create_kickoff.
Auth (2): auth_agent_signup, auth_get_pricing.
Cross (1): codens_register_project_unified.

Where this lands against the public reference points:

Server	Tools	Approx. tokens
Playwright MCP	many	13,700
Chrome DevTools MCP	many	18,000
5-server stack (mixed)	varies	~55,000
`codens-mcp` (unified)	31	~4,720

If we had shipped five separate MCPs, one per product, even at a conservative per-server registration overhead the stack would have cost ~65K tokens of context before any tool ran. We did not, and that is the whole story.

Why one package works

Five decisions did the work. None of them are clever. All of them are boring tradeoffs that happen to compound.

1. Prefix namespacing instead of MCP-server-level scoping

Every tool carries its product prefix in the name. The flat namespace makes the file you saw above legal:

purple_login, purple_whoami, purple_analyze_repo, ...
red_create_bug_report, red_analyze_bug_report, ...
blue_generate_e2e_test, blue_run_e2e_test, ...
green_convert_consultation_to_prd, ...
auth_agent_signup, auth_get_pricing
codens_register_project_unified

We pay verbosity in the tool name. We get zero collision risk and one MCP process. I considered nested groupings (codens.red.create_bug_report style), but flat names render cleaner in tool-use traces and grep better in logs. Worth it.

2. Shared client code

All five product clients live in one place:

src/codens_mcp/client/
  auth.py
  blue.py
  green.py
  red.py
  auth_helper.py    # JWT load/refresh, shared

This is the part that does not show up in the token count but matters for the maintenance story. Five separate MCP packages would mean five copies of auth_helper.py drifting independently. One package means one bug fix.

3. Single auth flow

Auth Codens is the SSO root for the family, so the MCP server only ever speaks one login dialect:

codens-mcp login        # Device Code Flow, runs once
# token persisted to ~/.purple-codens/credentials.json
# every product client reads the same file

The historical path is ~/.purple-codens/credentials.json because Purple shipped first and we did not want to break existing users by renaming. Cosmetic debt, zero functional cost.

4. Re-export pattern for Purple

This is the move that kept us honest. Purple already had a standalone MCP package on PyPI (purple-codens-mcp) before the unified server existed. We did not fork it. The unified package imports and re-registers Purple's tools:

# src/codens_mcp/tools/purple_tools.py
from purple_codens_mcp.tools.project_tools import register_project_tools
from purple_codens_mcp.tools.repo_tools    import register_repo_tools
# ...four more imports

def register_purple_tools(mcp: FastMCP) -> None:
    _register_purple_auth(mcp, _purple_get_client)
    _register_projects(mcp, _purple_get_client)
    _register_repos(mcp, _purple_get_client)
    # ...

Existing users of purple-codens-mcp on PyPI keep working unchanged. codens-mcp adds Red, Blue, Green, Auth, and Cross on top. One package can be fully replaced by the other without breaking anyone, which gave us a safe rollout.

5. Lazy execution

The 4,720 tokens is the registration cost. Claude Code sees all 31 tool descriptors at startup. Each tool's actual HTTP call only fires on invocation, and the per-call response is bounded by the tool's own prompt (usually a few hundred tokens of JSON). The thing that scales linearly with use is the conversation transcript, not the registration. Bloat at startup is the lever; we pulled it once, and the rest of the session is unaffected.

The honest tradeoffs

Unified is not free. Three things we gave up:

One process is one failure mode. If codens-mcp crashes, all five product surfaces are gone simultaneously. With separate MCPs each product gets its own isolation boundary and a Red bug cannot take down Green tooling. We accepted this because we are a small shop, the package is small, and a crash in production would tell us we have a much bigger problem than tool routing.

Update cadence is coupled. Shipping a new Red tool means cutting a new version of the whole package. Users get every product's churn whether they wanted it or not. We considered semver-per-product subnamespacing and rejected it because our internal release cadence is already weekly and roughly synchronized; the imaginary user who wants Red on a daily cycle but Green frozen does not exist for us yet.

Permission boundary is coarse at the MCP layer. Authenticating once gives the user access to all 31 tools. You cannot tell Claude Code "allow Red but not Green" through the MCP descriptors alone. We solved this one level up: Auth Codens enforces role-based permissions on the server side, so even if the MCP exposes green_create_kickoff, the API call rejects users who do not have the Green entitlement. The MCP becomes the surface; the gate lives elsewhere.

"Unified is always right" is not the conclusion here. If you ship one MCP per oncall team and the teams release on different cycles, you are paying the token tax for a reason, and the isolation buys you something real. The unified shape worked for us because the products were already coupled.

Where the token bloat actually comes from

Akshay's follow-up tweet closes the loop:

"The protocol was never the bottleneck. The loading strategy was."

That is the line I want every MCP author to internalize. The 55K-token figure is not what MCP-the-spec costs. It is what N separate handshakes plus N capability advertisements plus N redundant client preambles cost when you let your tools sprawl into N independent servers.

Look at the math from the other direction. If five separate MCPs each carry a 10–15K registration footprint (one server's worth of capability JSON, instructions, schema bundles), you are at 50–75K before the model has done anything useful. Collapse the five servers to one and the registration overhead collapses too, because there is only one capability list, one instruction blob, one schema bundle, and the per-tool descriptor cost is small.

The protocol is doing its job. The protocol is also fine with you stacking five copies of itself in your config file, because that is a user choice, not a spec smell. Treating MCP servers like microservices ("one per product, for isolation") is the analogue of running 30 Lambda cold starts where one process would do.

We did not invent a new transport. We did not strip schemas. We just stopped paying for five handshakes when one would do.

The principle

Partition your MCP surface by domain, not by tool class. If five tools share an auth root, a release cadence, and a user mental model, they belong in one server. If they do not, split. The token cost is a downstream signal of how well that partition matches reality.

codens-mcp is on PyPI: pip install codens-mcp. Code lives at github.com/codens-ai. If you want the user-facing pitch, that is at codens.ai/en.

DEV Community