AK DevCraft

Posted on May 11

Running a Personal AI Assistant for $0 - Part 1 - Architecture

#openclaw #llm #openclawchallenge #machinelearning

OpenClaw Challenge Submission 🦞

Introduction

A productivity tool that promised to change everything, charged monthly, and quietly became background noise. AI assistants are going the same way — another tab, another login, another $20/month for something you open twice a week. Probably, most of us regret about the subscription that we have today.

Welp! What if you didn't have to?

In today’s world, the infrastructure exists to run a capable always-on personal AI assistant, one that lives in your day to day regular apps like Telegram or WhatsApp, remembers you, browses the web, and handles real tasks — for exactly zero dollars a month. Not a trial. Not a teaser. Permanently free, on infrastructure you control.

This article explains the architecture that makes it possible and why each piece matters.

The Subscription Trap

Most people's AI setup looks like this: Claude.ai, ChatGPT, or any other AI providers in a browser tab or mobile app, opened when needed, closed when done. Conversations are saved, and you can go back to what you discussed last time if you're in the same thread. But it's passive history, not active memory. You have to go and find it. And across that whole time, it couldn't reach out, take action, or do anything unless you opened it first.

That's not an assistant. That's a very smart search box.

A real assistant is always on. It knows who you are. It operates in the apps you already use. It can take actions, not just generate text, and it doesn't charge you for existing.

Until recently, building that required either paying cloud AI bills or owning serious hardware. Both most likely out of reach for most people. However, that can be changed.

Three Shifts That Can Make This Possible

1. Open-weight models are now genuinely capable

Meta's Llama, Google's Gemma, and others have closed the gap with proprietary models significantly over the past few years. A 3-8 billion parameter model running locally can handle the majority of everyday tasks like summarising, drafting, answering questions, and light reasoning, that people actually use AI assistants for day-to-day.

2. Cloud providers offer permanently free compute

Oracle Cloud's Always Free tier gives you up to 4 ARM CPU cores and 24GB of RAM — permanently, with no expiry date. Not a 12-month trial like AWS. Not credits that run out. A real server running 24/7 at zero cost, forever, as long as you keep the account active.

That's enough to run Ollama with a capable local model.

3. Free API tiers have become genuinely useful

Google's Gemini API offers up to 1,000 requests per day on the free tier with no credit card required. For a personal assistant handling one person's queries, that's more than enough headroom. When a local model is too slow or too limited for a task, Gemini catches it — for free.

Put these three things together, and the economics change completely.

The Stack Architecture

Oracle Cloud ARM Instance — your always-on server. 4 CPU cores, 24 GB RAM, permanently free. Hosts everything. Never sleeps, never charges.
Ollama — runs open-source language models locally on your server. No API calls, no cost, no data leaving your machine. The primary brain is for most tasks.
Gemini API (free tier) — Google's fallback for when the local model is too slow or hits a complex task. 1,000 free requests per day—no credit card.
OpenClaw — The agent layer that ties everything together. Connects to Telegram, maintains memory across conversations, runs scheduled tasks, and routes requests between local and cloud models intelligently.

What It Can Actually Do

This isn't a just toy setup. On this stack, you get:

Telegram access — message your agent from your phone, anywhere, like texting a person
Persistent memory — it remembers your preferences, ongoing projects, and past conversations
Web search — real-time search via Tavily's free tier integrated directly into responses
File operations — read, write, and summarise documents on the server
GitHub integration — search issues, review code, summarise pull requests
Scheduled tasks — set reminders, recurring summaries, automated workflows
Custom agents — define specialised subagents for specific tasks (code review, research, writing)

What it can't do as well as a paid service: complex multi-step reasoning at speed, very long document analysis, and tasks that push the limits of a 3B parameter model. For those, the Gemini fallback steps in.

The Honest Tradeoffs

Zero cost doesn't mean zero compromise. Know what you're getting into:

Speed — local CPU inference is slower than cloud APIs. A response that takes a few seconds on Claude.ai might take > 30 seconds locally. With Gemini as a fallback, complex tasks are fast. Simple tasks on the local model are slow but free.
Quality ceiling — a 3B local model is noticeably less capable than Claude Sonnet or GPT-4. For writing, summarisation, and Q&A, it's fine. For nuanced reasoning or complex code, it shows limitations.
Setup effort — this is not a five-minute install. There are VCN configurations, systemd services, API keys, and model downloads involved. It takes an afternoon to set up correctly. Once running, it requires minimal maintenance.
Oracle ARM capacity — Oracle's free ARM instances are in high demand. You may need to retry provisioning multiple times or upgrade to Pay As You Go (which still costs $0 for Always Free resources) to get reliable access.

Who This Is For

It makes sense if:

You're comfortable with a terminal and basic Linux
You want AI infrastructure you actually control
You're experimenting and don't want ongoing costs
You're comfortable with slower responses in exchange for zero cost

It doesn't make sense if:

You need production-grade reliability
Response speed is critical
You want a turnkey experience with no configuration
You'd rather pay $10-20/month for something that just works

For the right person, this is the most interesting AI setup you can build right now. Not because it beats the paid alternatives on any individual metric, but because it's yours — running on your server, with your data, on your terms, for nothing and most importantly your private data on your laptop is far away from accidentally being exposed.

What's Next

This article is the first in a five-part series:

The Architecture ← you are here
Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas
Running Ollama on ARM — model selection, disk management, CPU inference reality
Installing OpenClaw on Linux — avoiding every trap
The Complete Setup — Telegram, Gemini fallback, end-to-end testing

Stay tune, all links will be updated as articles will be published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

DEV Community