DEV Community

Suifeng023
Suifeng023

Posted on

I Built a Fully Autonomous Coding Agent for Under $50/Month — Here's the Exact Setup

I Built a Fully Autonomous Coding Agent for Under $50/Month — Here's the Exact Setup

Three months ago, I watched an AI agent write, test, and deploy an entire microservice while I made coffee. That moment changed everything about how I work.

After months of experimenting, I've built a coding agent setup that handles 70% of my daily development tasks — bug fixing, code generation, testing, documentation — running 24/7 on my own infrastructure.

Total cost: $47/month. Here's exactly how I did it, and how you can replicate it in one afternoon.


Why Build Your Own Agent Instead of Using Copilot?

Don't get me wrong — GitHub Copilot is great. But it has limitations:

  • It only suggests within your IDE — no terminal access, no file system operations, no deployment
  • It can't run tests or validate its own output
  • It doesn't learn from your project's specific patterns beyond what's in the current file
  • You're limited to one model — what if Claude is better at refactoring while GPT is better at generating tests?

A custom agent gives you full control over the model, the tools, and the workflow.


The Architecture: 4 Components, $47 Total

┌─────────────────────────────────────────┐
│              ORCHESTRATOR               │
│         (Python + LangGraph)            │
│              $0/month                   │
├──────────┬──────────┬───────────────────┤
│  LLM 1  │  LLM 2   │    LLM 3         │
│ Claude  │ GPT-4o   │   Gemini Pro     │
│ $20/mo  │ $20/mo   │   $7/mo          │
├──────────┴──────────┴───────────────────┤
│           TOOL LAYER                    │
│   Terminal │ File System │ Browser      │
│   Git │ Docker │ npm/pip │ Linting      │
├─────────────────────────────────────────┤
│          KNOWLEDGE BASE                 │
│   Project docs │ Style guide │ Tests    │
│              $0/month                   │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Component 1: The Orchestrator (Free)

The brain of the operation. I use LangGraph to build a state machine that routes tasks to the right model and tool combination.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    task: str
    context: str
    model_used: str
    code_output: str
    test_results: str
    iteration: int
    messages: Annotated[list, operator.add]

def route_task(state: AgentState) -> str:
    """Route to the best model based on task type."""
    task = state["task"].lower()

    if any(w in task for w in ["refactor", "optimize", "clean", "improve"]):
        return "claude"  # Claude excels at code quality
    elif any(w in task for w in ["test", "debug", "fix", "error"]):
        return "gpt4o"   # GPT-4o is great at debugging
    elif any(w in task for w in ["document", "explain", "summary"]):
        return "gemini"  # Gemini for documentation
    else:
        return "claude"  # Default for generation

def should_iterate(state: AgentState) -> str:
    """Decide if we need another iteration."""
    if state["iteration"] >= 3:
        return END
    if "PASS" in state.get("test_results", ""):
        return END
    return "generate"
Enter fullscreen mode Exit fullscreen mode

The key insight? Different models excel at different tasks. Routing intelligently saves both money and quality.

Component 2: Multi-Model Setup ($47/month)

Here's my exact API spending breakdown:

Model Provider Cost/Month Best For
Claude 3.5 Sonnet Anthropic API ~$20 Code generation, refactoring
GPT-4o OpenAI API ~$20 Debugging, test writing
Gemini 1.5 Pro Google AI Studio ~$7 Documentation, large context

Pro tip: Use Google AI Studio's free tier for Gemini — you get 60 requests/minute free, which is plenty for documentation tasks.

import anthropic
import openai
import google.generativeai as genai

class ModelRouter:
    def __init__(self):
        self.claude = anthropic.Anthropic()
        self.gpt = openai.OpenAI()
        genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
        self.gemini = genai.GenerativeModel("gemini-1.5-pro")

    def generate(self, model: str, prompt: str, context: str = "") -> str:
        if model == "claude":
            response = self.claude.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                messages=[{"role": "user", "content": f"{context}\n\n{prompt}"}]
            )
            return response.content[0].text

        elif model == "gpt4o":
            response = self.gpt.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "system", "content": context},
                         {"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content

        elif model == "gemini":
            response = self.gemini.generate_content(f"{context}\n\n{prompt}")
            return response.text
Enter fullscreen mode Exit fullscreen mode

Component 3: The Tool Layer (Free)

This is where the magic happens. Your agent needs hands to interact with the codebase.

import subprocess
import os
from pathlib import Path

class DevTools:
    """Tools the agent can use to interact with the codebase."""

    def read_file(self, path: str) -> str:
        """Read a file from the project."""
        return Path(path).read_text()

    def write_file(self, path: str, content: str) -> str:
        """Write content to a file."""
        Path(path).parent.mkdir(parents=True, exist_ok=True)
        Path(path).write_text(content)
        return f"Written to {path}"

    def run_command(self, cmd: str, cwd: str = ".") -> str:
        """Execute a shell command safely."""
        # Safety: block dangerous commands
        blocked = ["rm -rf /", "sudo", "DROP TABLE", "> /dev/sda"]
        if any(b in cmd for b in blocked):
            return f"BLOCKED: Dangerous command detected"

        result = subprocess.run(
            cmd, shell=True, capture_output=True, 
            text=True, timeout=30, cwd=cwd
        )
        return result.stdout + result.stderr

    def run_tests(self, project_path: str) -> str:
        """Run project test suite."""
        return self.run_command(
            f"cd {project_path} && npm test 2>&1 || pytest {project_path} 2>&1"
        )

    def git_diff(self, project_path: str) -> str:
        """Show uncommitted changes."""
        return self.run_command(f"cd {project_path} && git diff")
Enter fullscreen mode Exit fullscreen mode

Component 4: Knowledge Base (Free)

Feed your agent context about your project so it writes consistent code:

class KnowledgeBase:
    def __init__(self, project_path: str):
        self.project_path = Path(project_path)
        self.context = self._build_context()

    def _build_context(self) -> str:
        parts = []

        # Read README
        readme = self.project_path / "README.md"
        if readme.exists():
            parts.append(f"# Project Overview\n{readme.read_text()[:2000]}")

        # Read config files for tech stack info
        for config in ["package.json", "pyproject.toml", "Cargo.toml"]:
            config_file = self.project_path / config
            if config_file.exists():
                parts.append(f"# Dependencies ({config})\n{config_file.read_text()[:1000]}")

        # Sample existing code for style
        src_dir = self.project_path / "src"
        if src_dir.exists():
            for py_file in list(src_dir.glob("**/*.py"))[:5]:
                parts.append(f"# Style Reference: {py_file.name}\n{py_file.read_text()[:1500]}")

        return "\n\n".join(parts)
Enter fullscreen mode Exit fullscreen mode

Putting It All Together: The Agent Loop

Here's the complete agent that ties everything together:

class CodingAgent:
    def __init__(self, project_path: str):
        self.tools = DevTools()
        self.router = ModelRouter()
        self.kb = KnowledgeBase(project_path)
        self.project_path = project_path
        self.graph = self._build_graph()

    def _build_graph(self) -> StateGraph:
        graph = StateGraph(AgentState)

        graph.add_node("route", route_task)
        graph.add_node("generate", self._generate_code)
        graph.add_node("test", self._run_tests)
        graph.add_node("review", self._review_code)

        graph.set_entry_point("route")
        graph.add_conditional_edges("route", route_task, {
            "claude": "generate",
            "gpt4o": "generate",
            "gemini": "generate"
        })
        graph.add_edge("generate", "test")
        graph.add_edge("test", "review")
        graph.add_conditional_edges("review", should_iterate, {
            "generate": "generate",
            END: END
        })

        return graph.compile()

    def execute(self, task: str) -> dict:
        """Run the agent on a task."""
        result = self.graph.invoke({
            "task": task,
            "context": self.kb.context,
            "model_used": "",
            "code_output": "",
            "test_results": "",
            "iteration": 0,
            "messages": []
        })
        return result

    def _generate_code(self, state: AgentState) -> dict:
        model = route_task(state)
        prompt = f"""
        Task: {state['task']}
        Project context: {state['context'][:3000]}

        Generate clean, tested code. Follow existing project patterns.
        Include any necessary imports and error handling.
        """

        code = self.router.generate(model, prompt, state["context"])

        # Auto-write generated code to appropriate file
        self.tools.write_file(
            f"{self.project_path}/generated_{state['iteration']}.py",
            code
        )

        return {
            "code_output": code,
            "model_used": model,
            "iteration": state["iteration"] + 1,
            "messages": [{"role": "assistant", "content": code}]
        }

    def _run_tests(self, state: AgentState) -> dict:
        results = self.tools.run_tests(self.project_path)
        return {"test_results": results}

    def _review_code(self, state: AgentState) -> dict:
        """Use a different model for review to catch blind spots."""
        review_prompt = f"""
        Review this code for bugs, security issues, and style problems.
        Code:\n{state['code_output']}
        Test results:\n{state['test_results']}

        If tests fail, explain what's wrong concisely.
        """
        review = self.router.generate("gpt4o", review_prompt)
        return {"messages": [{"role": "reviewer", "content": review}]}
Enter fullscreen mode Exit fullscreen mode

Real Results After 3 Months

Here's what this setup actually handles for me:

Task Time Saved/Week Quality
Bug fixes 8 hours 85% resolved without human review
Unit tests 5 hours 90% pass rate on first run
Documentation 4 hours Needs minor editing
Code generation 6 hours Good starter code, needs refinement
Total 23 hours/week

That's essentially half a full-time job handed off to the agent.

My Favorite Workflows

1. "Fix all failing tests"

agent.execute("Run the test suite and fix all failing tests")
Enter fullscreen mode Exit fullscreen mode

The agent runs tests, reads error messages, identifies root causes, and iterates until tests pass. Works about 70% of the time without intervention.

2. "Add tests for this module"

agent.execute("Write comprehensive tests for src/auth/handlers.py")
Enter fullscreen mode Exit fullscreen mode

The agent reads the existing code, understands the interfaces, and generates tests that cover edge cases I often miss.

3. "Document this API"

agent.execute("Generate OpenAPI docs for all endpoints in src/api/")
Enter fullscreen mode Exit fullscreen mode

Claude reads through all route handlers and produces accurate documentation. Gemini is especially good at this with its large context window.


What I Learned (The Hard Way)

✅ Do's

  1. Start with a narrow scope — don't try to replace your entire workflow on day one
  2. Use model routing — Claude for generation, GPT for debugging is significantly better than using one model for everything
  3. Implement safeguards — always sandbox file writes and block dangerous commands
  4. Feed good context — the knowledge base is what separates a useful agent from a random code generator
  5. Log everything — you'll learn a lot from reviewing what the agent tried and failed at

❌ Don'ts

  1. Don't let it run unattended on production repos — always review before merging
  2. Don't skip the iteration loop — the test → review → fix cycle is where the real quality comes from
  3. Don't underestimate token costs — start with smaller models (GPT-4o-mini) for simple tasks
  4. Don't ignore rate limits — implement queuing and backoff logic

The Future: Where This Is Heading

The cost will keep dropping. Here's what I'm excited about:

  • Local models (Llama 3, Mistral) are getting good enough for simple tasks — that could bring the cost to nearly $0
  • Anthropic's batch API offers 50% savings for non-urgent tasks
  • Multi-agent collaboration — I'm experimenting with having a "planner" agent break big tasks into sub-tasks for specialist agents

Want to Try It?

The complete code is available on my GitHub. But honestly, the best way to start is:

  1. Week 1: Set up Claude API + LangGraph with just file read/write tools
  2. Week 2: Add GPT-4o for debugging and a test runner tool
  3. Week 3: Build the knowledge base and add Gemini for documentation
  4. Week 4: Connect it all together with the iteration loop

Start small, measure the time saved, and expand from there.

The age of autonomous coding agents isn't coming — it's here. The question is whether you'll build your own or wait for someone else to sell you one.


What tasks would you hand off to an AI coding agent? I'd love to hear your use cases in the comments.

Top comments (0)