Integrating AI into a Legacy Broadcasting CMS(Content-uploading Manager System): Architecture Design

#ai #architecture #rag #systemdesign

How I designed a sermon AI pipeline that bridges a 20-year-old ASP Classic system with modern STT, LLM structuring, and vector search — without touching the legacy core.

The Problem

I work at a Korean Christian broadcasting company (GOODTV) serving over 900,000 users. Every week, 140 pastors upload sermon MP3s through an internal Windows-based CMS built on ASP Classic — a system that has been running for over two decades.

The ask from leadership: "Make it searchable by AI."

Specifically, they wanted a RAG (Retrieval-Augmented Generation) system where staff could ask natural language questions like "What did Pastor Kim preach about forgiveness last year?" and get accurate, sourced answers from the sermon archive.

Simple enough in theory. The challenge was the infrastructure I had to work with.

Mapping the Legacy Stack

Before writing a single line of new code, I spent time fully understanding the existing system. This step turned out to be the most important one.

Windows CMS Server (192.168.1.228)
  └── ASP Classic (.asp files, IIS)
  └── MSSQL database
  └── UI for 140 pastors uploading weekly sermons

Linux APM Servers × 2 (192.168.1.226 / 192.168.1.227)
  └── Apache + PHP transcoding pipeline
  └── MySQL (job queue)
  └── C binary: handles ffmpeg encoding + FTP upload

Developer Local PC (192.168.1.138)
  └── Python (Whisper, Ollama, Pinecone client)
  └── NVIDIA RTX 3060 12GB

The transcoding pipeline worked like this: when an admin submitted content in the CMS, the ASP page HTTP-POSTed the content index (idx) to a PHP file on one of the Linux servers. The PHP file queried the MSSQL database, built ffmpeg commands, inserted job records into MySQL, and conditionally invoked a compiled C binary via exec(). The C binary handled the actual media processing.

The key detail: the C binary already had the MP3 URL as a command-line argument.

That was my integration point.

The Core Design Constraint: Don't Break What Works

The most important architectural decision I made was also the most conservative: inject into the existing pipeline at its boundaries without modifying any core logic.

The reasoning was straightforward. The CMS had accumulated years of edge cases, business logic, and undocumented behavior. A 20-year-old ASP Classic system touching 140 active users has a blast radius that's hard to overestimate. Modifying the core meant risking sermon upload failures across the entire organization — not a trade-off worth making for a pilot feature.

So the architecture I designed respected these hard constraints:

No changes to ASP Classic core files — only additive UI changes (one checkbox)
No changes to the original C binary — a new _ai variant handles the AI path
No changes to MySQL job schema — new behavior piggybacked on existing structure
The AI pipeline failure must never affect the encoding pipeline — graceful degradation was non-negotiable The result:

[CMS 228] Admin checks "AI Learn" checkbox → submits content
    │
    ▼  HTTP POST (async, non-blocking)
[PHP 226/227] transcoding_ai.php
    ├── Query MSSQL: ai_learn_YN = 'Y'?
    ├── Is output file .mp3?
    └── If both true → MP3 path bypasses the 3-job queue limit
            │
            ▼  exec() → background
    [C binary] transcoding_ai
        ├── Existing ffmpeg encoding (unchanged logic)
        ├── FTP upload to CDN
        └── NEW: curl POST → Python API (fire-and-forget)
                │
                ▼  HTTP POST /process_url  →  202 Accepted
    [Python 138:8001] api_server.py
        ├── Download MP3 from CDN
        ├── Whisper large-v3 STT
        ├── Rule-based Bible name correction
        ├── LLM context correction (Ollama/gemma4)
        ├── LLM paragraph structuring (Ollama/llama3.1)
        └── Pinecone vector upload

The Gating Mechanism: ai_learn_YN

Not every sermon upload should trigger AI processing. Admins needed explicit control — for cost reasons, for content sensitivity, and because the AI pipeline takes 20–40 minutes per sermon.

I added a single column to the existing MSSQL content table:

ALTER TABLE content_list ADD ai_learn_YN CHAR(1) DEFAULT 'N';
ALTER TABLE program_list ADD ai_learn_YN CHAR(1) DEFAULT 'N';

And surfaced it as a checkbox in the ASP Classic UI. The PHP layer reads this flag before invoking the C binary:

// Queried once, outside the profile loop
$ai_learn_sql    = "SELECT ai_learn_YN FROM content_list WHERE idx = $idx";
$ai_learn_result = mssql_fetch_array(mssql_query($ai_learn_sql, $con));
$ai_learn_YN     = trim($ai_learn_result['ai_learn_YN']);

// Inside the while loop over content_platform_profile_list
if (substr($output_filename, -4) === '.mp3' && $ai_learn_YN === 'Y') {
    // MP3 + AI flag: bypass queue limit, always execute
    $mp3_url    = "https://online.goodtv.co.kr/" . $ftp_path . $output_filename;
    $shell_exec = "/home/cms/exec/transcoding_ai $list_idx $mp3_url $output_filename > /dev/null &";
    exec($shell_exec, $out, $err);
} else if ($cms_rows < 3) {
    // Non-MP3: respect existing 3-job queue limit for server protection
    $mp3_url    = "not_mp3";
    $shell_exec = "/home/cms/exec/transcoding_ai $list_idx $mp3_url $output_filename > /dev/null &";
    exec($shell_exec, $out, $err);
}

Two conditions gate the AI path. This prevents video sermons, thumbnails, and announcements from hitting the pipeline, and gives admins fine-grained control over what gets learned.

A Non-Obvious Decision: MP3 vs. Video Queue Handling

The original PHP code had a queue guard — if ($cms_rows < 3) — that prevented more than 3 concurrent ffmpeg jobs. This existed to protect the Linux servers from CPU overload during HD video encoding.

My initial implementation put the MP3 AI path inside this same guard. This turned out to be wrong: when the servers were busy encoding video (which they almost always are), the MP3 AI pipeline would never execute.

The fix required understanding why the guard existed. ffmpeg video encoding is CPU-intensive, taking 30–60 minutes per job at high bitrate. MP3 extraction is just audio demuxing — it's an order of magnitude cheaper:

# Video encoding (CPU-heavy, 30–60 min)
ffmpeg -i input.mpg -vcodec libx264 -s 1920x1080 -b:v 4000k output.mp4

# MP3 extraction (lightweight, 2–5 min)
ffmpeg -i input.mpg -vn -acodec libmp3lame -b:a 128k output.mp3

Once I understood this, the fix was clean: MP3 + AI jobs bypass the queue limit entirely. Video jobs keep the existing protection. The server load profile is unchanged; the AI pipeline is no longer bottlenecked.

Why a Local PC, Not a Cloud Server?

The Whisper large-v3 model requires a GPU. The Linux APM servers are CPU-only. Our organization wasn't ready to provision a GPU EC2 instance for a pilot project.

Running the Python API on my local workstation with an RTX 3060 was a deliberate trade-off:

Pro: Zero infrastructure cost for the pilot; full control over the environment
Pro: Faster iteration — code changes are instant, no deployment cycle
Con: Single point of failure; PC shutdown kills the pipeline
Con: Not production-ready; dependent on a developer's personal machine This was explicitly scoped as a pilot. The Python API is network-accessible on 192.168.1.138:8001, reachable from both Linux servers. Moving to production means deploying the same Flask + Python code to any GPU-enabled instance and changing one IP constant in the C binary.

What the Final Architecture Achieves

┌─────────────────────────────────────────────────────────────┐
│  PROPERTIES                                                 │
│                                                             │
│  ✅ Zero-downtime deployment: PHP and C binary swapped     │
│     while CMS served 140 active users                      │
│                                                             │
│  ✅ Failure isolation: Python API failure → C binary logs  │
│     the error and exits cleanly; encoding succeeds         │
│                                                             │
│  ✅ Idempotent uploads: Pinecone upsert semantics prevent  │
│     duplicate vectors on retry                             │
│                                                             │
│  ✅ Duplicate job prevention: 409 on second POST for the   │
│     same mp3_filename while job is active                  │
│                                                             │
│  ✅ Observable: job status queryable at GET /status/{id}   │
│                                                             │
│  ✅ Language-extensible: lang/ directory structure allows  │
│     adding English pipeline without touching Korean code   │
└─────────────────────────────────────────────────────────────┘

In Part 2, I'll cover the actual implementation: the debugging process of getting PHP → C binary → Python API to reliably hand off through a live encoding server, the multi-stage AI pipeline internals, and the non-obvious problems that showed up only under real production conditions.

Stack: ASP Classic · MSSQL · PHP · C (gcc) · Python · Flask · faster-whisper · Ollama · Pinecone