Last month I shipped MCP Spine v0.1 — a basic proxy that sat between Claude Desktop and MCP servers. It did schema minification and security basics.
Since then, it's grown into a full middleware stack. Here's everything in v0.2.5 and why each piece exists.
The Starting Point
57 tools. 5 servers. Claude Desktop config file with one entry pointing to Spine. Everything routes through the proxy.
pip install mcp-spine
mcp-spine init
The setup wizard detects your installed servers (npx, node, Python), asks what features you want, and writes a tailored config.
Schema Minification: 61% Fewer Tokens
Every tool call starts with the LLM reading tool schemas. With 57 tools, that's thousands of tokens before the conversation even begins.
Spine's minifier strips $schema, additionalProperties, parameter descriptions, titles, and defaults — keeping only what the LLM actually needs. Level 2 cuts 61% of schema tokens with zero information loss.
The web dashboard shows real-time savings:
State Guard: No More Stale Edits
In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it "edits" the old version — silently overwriting your current code.
State Guard watches your project files, computes SHA-256 hashes, and injects compact version pins into every tool response. When Claude's cached version doesn't match, it knows to re-read.
Prompt Injection Detection
This one surprised me. Tool responses can contain text that looks like instructions to the LLM — "ignore previous instructions", "[SYSTEM]", or encoded payloads.
Spine now scans every tool response for 8 categories of injection patterns before it reaches the model. Detections are logged as security events and can trigger webhook alerts to Slack or Discord.
# spine/injection.py detects:
# - System prompt overrides
# - Role injection ("you are now a...")
# - Instruction hijacking
# - Jailbreak attempts (DAN, developer mode)
# - Data exfiltration URLs
# - Base64-encoded payloads
Plugin System: The Compliance Layer
This is the feature I'm most excited about. Spine plugins are Python files that hook into the tool call pipeline:
from spine.plugins import SpinePlugin
class SlackFilter(SpinePlugin):
name = "slack-filter"
deny_channels = ["hr-private", "exec-salary"]
def on_tool_response(self, tool_name, arguments, response):
if "slack" not in tool_name:
return response
# Filter messages from denied channels
content = response.get("content", [])
filtered = [b for b in content
if not any(ch in b.get("text", "").lower()
for ch in self.deny_channels)]
return {**response, "content": filtered}
Drop it in your plugins/ directory, enable in config, done. The LLM never sees messages from those channels.
Four hook points: on_tool_call (transform args or block calls), on_tool_response (filter responses), on_tool_list (hide tools), and lifecycle hooks.
Web Dashboard
Zero-dependency browser dashboard at localhost:8777:
mcp-spine web --db spine_audit.db
Shows tool calls, security events, token budget usage, schema token savings, server latency, request log, and client sessions. Auto-refreshes every 3 seconds.
Tool Response Caching
Read-only tools like read_file and list_directory often get called with the same arguments multiple times in a conversation. Spine now caches these responses:
[tool_cache]
enabled = true
cacheable_tools = ["read_file", "read_query", "list_directory"]
ttl_seconds = 300
Cache hits skip the downstream server call entirely. LRU eviction with TTL expiration.
Everything Else in v0.2.5
-
Token budget: daily limits, per-server limits, warn/block actions, persistent tracking,
spine_budgetmeta-tool -
Tool aliasing:
create_or_update_file→edit_github_file - Config hot-reload: edit config while running, changes apply in seconds
- Webhook notifications: Slack/Discord/JSON alerts on security events
-
Multi-user audit: session-tagged entries,
mcp-spine audit --sessions - Analytics export: CSV/JSON with time and event filtering
- Streamable HTTP: MCP 2025-03-26 transport support
-
Interactive wizard:
mcp-spine initdetects your setup - Latency monitoring: per-server tracking with degradation alerts
The Numbers
- 20 source files
- 190+ tests
- CI on Windows + Linux, Python 3.11-3.13
- AAA score on Glama
- Approved on mcpservers.org
- MIT licensed
Try It
pip install mcp-spine
mcp-spine init
mcp-spine doctor --config spine.toml
mcp-spine serve --config spine.toml
mcp-spine web --db spine_audit.db
GitHub: https://github.com/Donnyb369/mcp-spine
What would you build with a plugin system for MCP tool calls?

Top comments (6)
Building a middleware stack for MCP tool calls is exactly what the ecosystem needs right now to move past the 'discovery bottleneck.' I’m a big fan of the 'Spine' metaphor—we need a central nervous system to handle things like rate-limiting and context-shaping before the LLM even sees the tool output.
In my own work on the 'Sovereign Synapse,' I’ve been looking at similar 'Thin Proxy' architectures to prevent context rot. Curious—how are you handling the latency overhead as the middleware stack grows? Looking forward to following the progress on v0.3.
Prompt injection detection at the proxy layer (your spine/injection.py approach) solves the input side. The gap that stays open: once a tool call passes through and an action gets executed, there is no immutable record of what the model received, what it decided, and what actually ran. State Guard with SHA-256 is a good step toward version integrity - curious whether you log those state transitions in a way that survives outside the local session. We ran into exactly this audit-trail problem building Trust Layer for multi-agent workflows (arkforge.tech), where proving what an agent did post-hoc matters as much as filtering what it sees pre-execution.
The State Guard approach works well for single-agent sessions, but there's a TOCTOU gap worth flagging: the SHA-256 is computed at tool-response time and injected as a version pin, but in multi-agent or concurrent-session setups, another process can modify the file between when State Guard hashes it and when the LLM acts on that pin. The LLM sees a "consistent" hash that was already stale before the write arrived. A simple mitigation is including a monotonic read timestamp alongside the hash so the write tool can reject not just content mismatches but also pins that are older than a configurable threshold.
the middleware approach is elegant until you’re debugging a tool call that fails silently three layers deep. 61% token reduction is real but every proxy adds a failure mode that’s harder to trace in production.
Solid systems work adding a middleware layer for tool calls improves control, observability, and reliability in complex agent pipelines. That’s exactly where scalable AI tooling starts to matter.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.