DEV Community

Cover image for The MCP Discovery Problem: Why 7,500+ Servers Is Both a Victory and a Warning
Ricardo Rodrigues
Ricardo Rodrigues

Posted on

The MCP Discovery Problem: Why 7,500+ Servers Is Both a Victory and a Warning

The ecosystem is growing. But finding the right server is getting harder, not easier.


There is a number that gets thrown around a lot in the MCP ecosystem right now: 20000+.

That is roughly how many MCP servers exist as of April 2026. It is an impressive number. A year ago, there were a few dozen. The growth curve looks exactly like npm circa 2012–2015 — exponential, messy, and full of potential.

But there is a problem nobody is talking about loudly enough.

The discovery problem did not get smaller when the ecosystem grew. It got bigger.


From "does it exist?" to "which one doesn't break in production?"

In early 2025, the question developers asked was simple: is there an MCP server for Postgres? For Slack? For GitHub?

The answer was usually no, or "sort of, check this GitHub repo."

By April 2026, the answer to almost every "is there an MCP for X?" question is yes — often with four to eight options. The new question is harder: which one is maintained? Which one has an install config that actually works? Which one is still actively developed?

A developer covering the MCP ecosystem at miaoquai.com framed this shift better than I have seen anywhere else — paraphrasing from the original Chinese: users are no longer asking whether an MCP server exists for something. They are asking which one is actually good. That move, from "does it exist?" to "which one is trustworthy?", is how you know an ecosystem is maturing.

This is the shift from scarcity to noise. And it is the hardest phase of any ecosystem to navigate.


The signal problem

When developers browse the 7,561 servers indexed at MCPNest, the most common question is not "is there a server for X?" — it is "which of these four options for X should I actually use?"

The default answer most people fall back on is GitHub stars. Stars are visible, comparable, and familiar. The problem is that stars measure historical interest. They tell you how many people were excited about a server at some point in the past. They tell you very little about whether it will work today.

A server can accumulate thousands of stars and then go unmaintained. The stars stay. The maintenance does not.


What quality actually means for an MCP server

We built a Quality Score (A–F) for every server in the MCPNest registry. Not because scores are fun — but because without a better signal, developers keep defaulting to star counts.

The factors we look at:

Maintenance velocity. When was the last commit? A server updated two weeks ago is categorically different from one updated six months ago, even if the code looks identical.

Config completeness. Does the server have a working install config for Claude Desktop, Cursor, or VS Code? A server without a valid install config is not really usable by most developers, regardless of what the README says.

Verification status. Is it listed in the official Anthropic registry? Not a quality guarantee, but a meaningful baseline signal.

Documentation depth. Does the README explain what the server actually does, what tools it exposes, and what credentials it needs?

The principle is simple: a well-maintained server with 300 stars should score higher than an abandoned one with 3,000. That is what we are trying to make visible.


The npm parallel

npm crossed 100,000 packages in 2015. The JavaScript community went through a long reckoning about package quality, maintenance, and trust — left-pad, node_modules bloat, abandoned dependencies pulling production apps down with them.

The MCP ecosystem is smaller and moving faster. A similar reckoning will happen. The question is whether the tooling to handle it gets built proactively or reactively.


What comes next

Quality scoring is a start, but it is a static snapshot. What matters more is dynamic health — knowing when a server you depend on stops being maintained, or when a previously low-scoring server improves significantly.

The goal is not to gatekeep the ecosystem. Every server deserves to be discoverable. The goal is to give developers the context to make informed decisions quickly, so they spend less time debugging abandoned configs and more time building.

7,561 servers indexed is a milestone. But the milestone that actually matters is: how many of those are good, maintained, and ready to use today?

That is the number we are working on making transparent.


MCPNest (mcpnest.io) is a marketplace for MCP servers with Quality Scores, one-click install for Claude, Cursor, Windsurf and VS Code, and an enterprise Gateway for teams.

Top comments (7)

Collapse
 
kenwalger profile image
Ken W Alger

Excellent breakdown of the scale problem. We’ve reached the 'USB-C moment' for connectivity, but we’re quickly hitting the 'Instruction Overload' wall. 7,500+ servers is a massive victory for the ecosystem, but if an agent has to scan even a fraction of those, selection accuracy collapses and token costs balloon.

I've been looking at this as a Governance vs. Discovery problem. In my own work, I'm finding that we have to move away from 'Total Discovery' and toward a Thin Proxy or Routing Layer. We need to treat MCP servers like third-party dependencies—they require vetting, sandboxing, and a 'least-privilege' context. Discovery is great for a weekend project, but curation is the only way this scales in the enterprise.

Collapse
 
codemalasartes profile image
Ricardo Rodrigues

@kenwalger — you described exactly what we've been building against.

The 'Governance vs Discovery' framing is right. Discovery gets you to awareness. But curation, sandboxing, and least-privilege is where enterprise adoption actually happens.

We shipped three layers for this:

→ Gateway: one endpoint per workspace, all approved servers behind it. Agents don't scan 7,500 servers — they see only what you've curated.

→ Layer 2: per-member Bearer tokens with tool allowlists. 'least-privilege context' enforced at the call level. -32003 if a tool isn't permitted.

→ Hosted containers: each server runs isolated (cap_drop ALL, no-new-privileges, localhost-only bind). The sandboxing you described.

The 'Instruction Overload' problem is real too — our Orchestrator namespaces tools across servers and caches the unified list, so agents aren't scanning everything on every call.

Would be curious what you're seeing in your own work on the routing layer side.

Collapse
 
kenwalger profile image
Ken W Alger

Great to see this level of granularity. Your Gateway approach mirrors exactly what I’ve been advocating for—treating the agent-server relationship as a curated handshake rather than a broad broadcast.

On the routing side, I’ve been looking at Semantic Routing to handle the 'Instruction Overload.' Instead of providing a unified list (even a cached one), I’ve been experimenting with a 'pre-flight' classifier that exposes only the specific server-namespace relevant to the intent. It keeps the context window lean and preserves the agent's 'reasoning budget' for the actual task. That enforcement is the 'Forensic Receipt' I love to see—security and auditability as first-class citizens.

Thread Thread
 
codemalasartes profile image
Ricardo Rodrigues

@ken Ken — "curated handshake" and "Forensic Receipt" are
better than anything in our marketing copy. Borrowing
both, with credit.

Semantic Routing for the pre-flight classifier is
exactly where we're heading. Right now we cache the
full unified list (30s TTL) but you're right that
intent-based namespace exposure is the next step —
keeps the context window lean and turns tool selection
from O(N) to O(relevant).

Would love to compare notes properly. Are you publishing
on the pre-flight classifier work? If yes, would help
us shape v1.14 around it. Either way — appreciate the
deep engagement here.

Thread Thread
 
kenwalger profile image
Ken W Alger

Haha, I’m glad those terms stuck! Consider them a contribution to the 'Decentralized Scribe' of the MCP ecosystem. I’ll waive the royalties in precious metals for now, but I might hold you to a 'finder's fee' of a successful v1.14 launch.

In all seriousness, moving from $O(N)$ to $O(relevant)$ is the exact inflection point for enterprise-grade MCP. Caching is a great bridge, but as you scale, that 'unified list' eventually becomes its own form of noise. Intent-based namespacing is the only way to protect the agent's reasoning budget.

I haven't released the full deep-dive on the Pre-flight Classifier work yet—it’s slated for my 'Sovereign Synapse' series in late August—but I have the architectural framework and early benchmarking ready. I’d be happy to 'open the vault' early and compare notes. If it helps shape v1.14, it’s a win for the ecosystem.

Reach out via DM and I'm happy to share the repo that I've been working on.

Thread Thread
 
codemalasartes profile image
Ricardo Rodrigues

Ken, the "finder's fee" is officially logged in the audit trail —
timestamped, attributed, and non-repudiable.

The Pre-flight Classifier is exactly the direction we need to go.
The current approach — cached unified list per workspace — works at
the scale we're at, but you're right that it becomes its own form
of noise as server count grows. Intent-based namespacing is the
correct abstraction.

What you're describing maps well to something we've been thinking
about on the Gateway side: instead of exposing the full
tools/list to every request, routing only the relevant namespace
based on the inferred intent of the call. The "reasoning budget"
framing is a useful forcing function for that design decision.

Would genuinely value seeing the architectural framework before
the Sovereign Synapse series drops. Sending you a DM now.

Thread Thread
 
kenwalger profile image
Ken W Alger

You've hit on the exact scaling wall I’m seeing. As soon as you move past a handful of servers, the 'context tax' of providing a unified tools/list becomes a liability rather than a feature. The goal with the Pre-flight Classifier is to move from a broadcast model to a unicast model—giving the agent exactly what it needs for the specific intent and nothing more. It’s the difference between handing an expert a whole library versus opening the right book to the right page. Looking forward to diving deeper into this logic in the upcoming series!