DEV Community

Cover image for From Scrapers to MCP Server: Serving Korean Entertainment Data to AI Agents
Cara Jung
Cara Jung

Posted on

From Scrapers to MCP Server: Serving Korean Entertainment Data to AI Agents

Korean entertainment data is surprisingly fragmented. Information about a single drama or film is often scattered across multiple platforms.

To solve that, I built a unified Korean entertainment database powered by APIs, web scrapers, and automated sync pipelines. By the end of the project, I had a Supabase database containing nearly 10,000 Korean movies, 3,500 TV shows, per-episode Nielsen Korea ratings, award histories, and streaming availability across four regions.

The next problem was figuring out how to expose it to AI agents in a way that was actually useful, secure, monetizable, and maintainable.

This is the story of building the MCP server and the errors I encountered.


Designing the Tools

Before writing any code, I thought carefully about what an AI agent would actually need from a Korean entertainment database. The answer wasn't "expose every database column as a query parameter." That's an API, not a tool. MCP tools should be opinionated about what they return and why.

I ended up with 17 tools organized into three categories:

Discovery tools answer "what should I watch?" — get_trending_dramas, browse_by_genre, browse_by_tag. The tag tool is the most distinctive: MyDramaList's community taxonomy ("Bromance", "Enemies to Lovers", "Time Travel", "CEO Male Lead") doesn't exist anywhere else in structured form, and it's exactly how K-drama fans actually think about recommendations.

Detail tools answer "tell me everything about this title" like get_movie, get_drama, get_episode_ratings, get_ost_albums. The episode ratings tool is the one I'm most proud of: it returns Nielsen Korea per-episode viewership percentages scraped from SVG chart elements on Naver. No English-language API has this data.

Utility tools answer cross-cutting questions — find_where_to_watch, get_weekly_boxoffice, get_actor_filmography, compare_ratings. The compare_ratings tool is genuinely novel: it shows you Naver's verified Korean ticket buyer score, MDL's international fan score, TMDB's global community score, and RT's Western critic score side by side, with labels explaining what each audience represents.


Building the Server with FastMCP

FastMCP makes building MCP servers surprisingly clean. Each tool is a decorated Python function with a docstring that becomes the tool description:

from fastmcp import FastMCP

mcp = FastMCP(
    name="Korean Entertainment",
    instructions="""
You have access to a comprehensive database of Korean movies and TV shows.
Rating fields and what they mean:
- mdl_rating: International K-drama fans (0-10)
- naver_audience_rating: Korean verified ticket buyers (0-10)
- naver_latest_rating: Nielsen Korea latest episode viewership (%)
""",
)

@mcp.tool
def browse_by_tag(tag: str, limit: int = 20) -> list[dict]:
    """
    Browse Korean dramas by MyDramaList community tag.
    Common tags: "Bromance", "Time Travel", "CEO Male Lead",
    "Enemies to Lovers", "Revenge", "Found Family"
    """
    return _supabase.table("tv_shows") \
        .select("id, title_english, title_korean, year, mdl_rating, tags") \
        .contains("tags", [tag]) \
        .order("mdl_rating", desc=True) \
        .limit(limit) \
        .execute().data or []
Enter fullscreen mode Exit fullscreen mode

The tools call db/queries.py directly, the same query layer the pipeline uses to write data. No intermediate API layer needed. When Claude calls browse_by_tag(tag="Revenge", limit=5), it goes straight to Supabase.


Choosing Authentication: Descope

For a monetizable MCP server, I needed real OAuth 2.1 authentication. Without auth, anyone with the URL can use your server, which fine for testing, but not for marketplace listings where you might want to gate access or track usage per user.

I chose Descope for three reasons:

  1. FastMCP has a first-class DescopeProvider integration
  2. Descope supports Dynamic Client Registration (DCR), which lets MCP clients like Claude register automatically without manual configuration
  3. Their free tier is generous enough for an early-stage project

The final auth setup in server.py is just four lines:

from fastmcp.server.auth.providers.descope import DescopeProvider

_auth = DescopeProvider(
    config_url=os.environ["DESCOPE_CONFIG_URL"],
    base_url=os.environ["SERVER_URL"],
)

mcp = FastMCP(name="Korean Entertainment", auth=_auth)
Enter fullscreen mode Exit fullscreen mode

Getting to those four lines took about seven failed deployments.


The Deployment Errors Worth Knowing About

Every deployment has gotchas. Here are the ones that will actually save you time if you're building something similar.

Corrupted file from incremental edits

When making multiple edits to server.py, one automated string replacement accidentally jammed code into the middle of an import block:

from db.queries import (
    get_movie_by_tmdb_id,
    get_movie_by_title,auth=DescopeProvider(  # ← corrupted by bad replacement
    get_movies,
Enter fullscreen mode Exit fullscreen mode

Railway reported this as SyntaxError: '(' was never closed a confusing error that had nothing to do with parentheses. The actual problem was a botched edit 20 lines earlier.

The lesson: when making multiple changes to the same file, regenerate the whole file from scratch rather than applying incremental patches. A clean file beats a patched one every time.

Missing https:// prefix

The SERVER_URL environment variable was set to kr-movie-tv-mcp-production.up.railway.app without the https:// prefix. Pydantic rejected it immediately:

Input should be a valid URL, relative URL without a base
[type=url_parsing, input_value='kr-movie-tv-mcp-production.up.railway.app']
Enter fullscreen mode Exit fullscreen mode

Simple fix, but it costs a full Railway deployment cycle (about 3 minutes) to discover. Always include the schema when setting URL environment variables.

How Descope's token validation actually works

One thing worth knowing from Descope's documentation: the MCP Server URL field in the console is optional. When set, it adds an aud (audience) claim to access tokens, scoping them to your specific server. When left unset, no audience claim is added and tokens are validated purely against the .well-known config URL.
This means you can get a fully working OAuth setup (complete handshake, Dynamic Client Registration, and tool discovery) with just the .well-known URL configured. The audience field is an additional security layer for production environments where you want strict token scoping, not a prerequisite for the integration to work.


Deploying to Railway

I chose Railway over the free alternatives for one reason: no cold starts. Render's free tier spins down after 15 minutes of inactivity and takes 30-60 seconds to restart. For an MCP server that needs to respond to tool calls quickly, a cold start on the first request would produce a bad user experience and potentially cause claude.ai to show a timeout error.

Railway at $5/month gives you an always-on container with automatic deploys from GitHub. The configuration is minimal, a railway.toml that specifies the start command:

[build]
builder = "nixpacks"

[deploy]
startCommand = "python server.py"
restartPolicyType = "always"
Enter fullscreen mode Exit fullscreen mode

The server runs with streamable-http transport, which is what claude.ai and other MCP clients expect for remote servers:

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8000))
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=port,
    )
Enter fullscreen mode Exit fullscreen mode

Railway injects its own PORT environment variable, in this case 8080, so the server binds to whatever port Railway assigns. The 8000 default in the code is only used when running locally.


Testing End-to-End

Once the server was running, I tested it through Claude Code locally first:

claude mcp add korean-entertainment -- python server.py
claude
Enter fullscreen mode Exit fullscreen mode

The first query I ran:

"Which Korean drama had the biggest Nielsen viewership jump between its first and last episode?"

Claude called get_top_dramas to get a list of candidates, then called get_episode_ratings for each one, computed the delta, and returned:

Crash Landing on You jumped from 6.1% → 21.7% (+15.6 percentage points), beating My Love from the Star's 15.6% → 28.1% jump (+12.5pp).

That's real Nielsen Korea data, pulled from SVG chart elements on NAVER, stored in Supabase, served through FastMCP, reasoned over by Claude. The full pipeline working end-to-end.

After connecting to claude.ai directly, I tested a more complex query:

"Find me a Korean thriller drama rated above 8.5 with the Revenge tag, and show me where I can watch it in the US"

The response correctly identified The Glory (더 글로리) at 8.9 MDL rating, confirmed it streams on Netflix US, and included context about the writing and direction. Cross-source, cross-tool reasoning working exactly as designed.


What the MCP Server Unlocks

The combination of data sources creates queries that weren't previously possible from any single API:

"Find dramas where Korean audiences loved it but Western audiences didn't":
This requires naver_audience_rating (Korean verified buyers) vs rt_tomatometer (Western critics): two fields from two different scrapers on the same row.

"Show me the episode rating trajectory for currently airing dramas":
This requires get_trending_dramas + get_episode_ratings: airing status from MDL, viewership from Naver's SVG charts.

"What Korean films won awards and are now on Netflix?":
This requires joining awards, movies, and streaming_availability: three tables from three different sources.

None of these queries are possible against any existing Korean entertainment API, because no existing API has all three pieces. That's the value proposition.


Getting It In Front of Users

With the server running, the next priority was distribution, which is getting it listed everywhere developers and AI users look for MCP servers.

Smithery

Smithery was the fastest listing. Paste your Railway URL, complete the OAuth flow, and their inspector automatically discovers all 17 tools and generates the listing. The whole process took under 10 minutes. Smithery is worth listing on because it's where Claude Code users browse for MCP servers such as developers who are already in an agentic workflow and actively looking for new tools to add.

Glama

Glama has over 23,000 MCP servers listed and is one of the most-searched directories in the ecosystem. Submitting is straightforward, just add your URL and GitHub repo. They have a private notes field for their review team; since the server uses OAuth, I left instructions explaining how to connect via claude.ai rather than providing API keys. Glama is worth listing on because it indexes for search relevance and recent usage, so active servers rise in rankings over time.

mcp.so

mcp.so has over 20,000 servers and accepts submissions via GitHub issue. The server config field expects the standard MCP JSON format:

{
  "mcpServers": {
    "korean-entertainment": {
      "url": "https://kr-movie-tv-mcp-production.up.railway.app/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Worth listing because it's one of the most-linked directories in MCP ecosystem articles and gets significant organic search traffic.

Cline Marketplace

Cline is a popular AI coding assistant with millions of users and their own MCP marketplace backed by a GitHub repo. Submission is a GitHub issue with your server name, repo URL, endpoint, tool list, and what makes it unique. The review team checks for code quality and documentation before approving. Worth the effort because Cline users are developers who tend to build on top of tools they discover: potential integrators, not just users.

MCP-Hive

MCP-Hive launched May 11, 2026 as the first marketplace with actual pay-per-call revenue sharing for server providers. MCP-Hive is the main monetization play in this ecosystem right now; every other directory is free discovery with no revenue component.


What's Next

The distribution is in place. What comes next:

  1. Complete initial population — ~6,500 TV shows still need TMDB sync as GitHub Actions minutes reset monthly
  2. Add per-user billing — Descope supports scope-based access control, enabling a free tier (basic search) vs paid tier (Nielsen data, awards)
  3. KMDb integration — API membership pending, will add Korean film archive data when approved

The database gets richer every night. The server is listed where it needs to be. Time to let the data do the work.

Top comments (0)