Roberto de la Cámara

Posted on May 10 • Originally published at alwaysbuilding.hashnode.dev

I Built a Research Agent and Plugged It Into Claude Code via MCP — Here's What Happened

#ai #mcp #langchain #devtools

How I integrated a LangGraph-based autonomous research agent as an MCP server for Claude Code, and the real output it produced on agentic AI frameworks.

A few weeks ago I finished building a LangGraph-based research agent that searches 10 sources in parallel — Tavily, Wikipedia, arXiv, Semantic Scholar, GitHub, Hacker News, Stack Overflow, Reddit, YouTube, and a local RAG store — and synthesizes everything into a cited report with a fact-checker pass.

The problem: to use it I had to switch to a Streamlit tab, paste a topic, and wait. That workflow friction adds up fast.

Today I fixed it. The agent is now a native tool inside Claude Code via MCP. I can just say "research X" mid-conversation and Claude calls it automatically.

My Setup (for context)

This runs on my WSL2 Ubuntu box. The research agent uses Ollama + qwen3:14b locally (no API costs). The model lives on the same machine, so inference is CPU-only here — I don't have a GPU in this rig. A full research run takes 5–6 minutes on CPU; would be under 2 on an RTX card.

The homelab also has a Jenkins controller on another machine, Gitea on a Raspberry Pi 5, and a bunch of other stuff — but for this post, none of that matters. What matters is: everything runs locally, offline-capable.

What is MCP?

Model Context Protocol is a JSON-RPC 2.0 standard over stdio. You write a server that implements initialize, tools/list, and tools/call. Claude Code (and other MCP clients like Cursor or Continue) launch your server at startup and treat its tools as first-class.

No HTTP server, no ports, no auth tokens. Just stdin/stdout and ~300 lines of stdlib Python.

The Integration

The agent already had a mcp_server.py I wrote last month and never wired up. Two bugs blocked it:

Bug 1 — wrong venv path

# was (never worked):
VENV_PYTHON = PROJECT_ROOT / ".venv" / "bin" / "python"

# actual venv:
VENV_PYTHON = PROJECT_ROOT / "venv" / "bin" / "python"

Bug 2 — API keys not reaching the subprocess

Claude Code launches the MCP server as a child process. Without explicitly loading .env, the keys (TAVILY_API_KEY, YOUTUBE_API_KEY, etc.) are missing:

from dotenv import load_dotenv
load_dotenv(PROJECT_ROOT / ".env", override=False)

Then ~/.mcp.json:

{
  "mcpServers": {
    "research-agent": {
      "command": "/home/roberto/repos/Research-Agent/venv/bin/python",
      "args": ["/home/roberto/repos/Research-Agent/mcp_server.py"]
    }
  }
}

Restart Claude Code, done.

Quick Win: ~/.mcp.json is user-level — the server is available in every project, not just this one.

Real Test Output

I ran it on "agentic AI frameworks LangGraph AutoGen CrewAI 2025". Here's what came back:

LangGraph + CrewAI convergence (the Aha! moment)

Found a paper (arXiv:2411.18241) documenting a real integration of both: LangGraph handles the graph-based state machine, CrewAI handles intelligent task assignment with live performance metrics. This is exactly the architecture pattern I've been building toward — validating by accident.

The RAG security problem no one talks about

arXiv:2605.05287 — May 2025. In multi-tenant agentic systems, RAG retrieval ranks by semantic similarity, not authorization. So user A's query can surface user B's confidential document if it's the best semantic match. This isn't a config mistake — it's an architectural flaw in how most RAG pipelines are built.

I wasn't researching security. The agent surfaced this because it searches arXiv in parallel with everything else. That's the value: you find what you didn't know to search for.

The fact-checker caught hallucinated statistics

The synthesis LLM added "30% efficiency improvement", "70% of security issues", "60% of commits" — all precise-sounding numbers attached to real findings. The fact-checker pass (a second LLM call specifically for this) flagged all three as unverified.

The pattern: LLMs add numerical specificity to make summaries feel more authoritative. The fact-checker's job is to catch exactly this. If you're building a research pipeline, this second-pass evaluation is not optional.

Two More Bugs I Fixed Along the Way

Session memory path hardcoded to Docker:

# was (broken outside container):
_SESSION_MEMORY_DIR = "/app/data/session_memory"

# fixed:
_SESSION_MEMORY_DIR = os.environ.get(
    "SESSION_MEMORY_DIR",
    os.path.join(os.path.dirname(__file__), "..", "..", "data", "session_memory"),
)

bleach as hard dependency for HTML reports:

def sanitize_html(html_content: str) -> str:
    try:
        import bleach
    except ImportError:
        return html.escape(html_content)  # safe fallback, no crash
    ...

What This Unlocks

The agent now runs during any Claude Code conversation. Practical example:

"Before we refactor the cache layer, research current best practices for distributed caching in Python 2025."

Claude calls research, reads the report, and informs the refactor. No tab switching. No copy-pasting. The research is cited and fact-checked before it influences the code.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.