No GPU. No subscription. No kidding.
Here's how to run powerful cloud-hosted AI models through Ollama — completely free — using just one command.
The Secret Nobody's Talking About
Most developers assume that running a model like GLM-4.7, GPT-OSS 120B, or Gemma3 27B requires either expensive hardware or a paid cloud API. But Ollama quietly introduced something called cloud models — models that run on Ollama's infrastructure, not your machine, and many of them are free.
The catch? You need a smart way to use them for coding. Enter OpenCode — an AI-powered coding agent that plugs right into Ollama.
What You'll Need
Before anything else, make sure you have both tools installed:
1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Or download the installer from https://ollama.com for Windows/macOS.
2. Install OpenCode
npm install -g opencode-ai
OpenCode is a terminal-based AI coding agent. Think of it as a free, local alternative to GitHub Copilot Workspace.
That's it. You don't need to ollama pull anything. Cloud models are fetched on-demand — no gigabytes of weights filling up your disk.
One Command to Rule Them All
ollama launch opencode --model glm-4.7:cloud
That's the whole magic. Swap glm-4.7:cloud with any free cloud model below and you're done.
OpenCode will open an interactive coding session powered by the model you chose, running in Ollama's cloud — no local GPU required.
Free Cloud Models You Can Use Right Now
These models have been tested and confirmed to work without a Pro subscription:
| Model | Command | Notes |
|---|---|---|
| GLM-4.7 (Z.AI) | --model glm-4.7:cloud |
Strong reasoning, free cloud-only |
| GPT-OSS 20B (OpenAI) | --model gpt-oss:20b-cloud |
OpenAI open-source, confirmed ✅ |
| Gemma3 27B (Google) | --model gemma3:27b-cloud |
Google's latest, confirmed ✅ |
| Gemma3 4B (Google) | --model gemma3:4b-cloud |
Lighter, fast, great for quick tasks |
| Devstral Small 2 (Mistral) | --model devstral-small-2:24b-cloud |
Coding-specialized, confirmed ✅ |
| Minimax M2.5 | --model minimax-m2.5:cloud |
Top open-source SWE benchmark |
| Qwen3 Coder 480B (Alibaba) | --model qwen3-coder:480b-cloud |
Massive coding model, free! |
| Qwen3 Next 80B (Alibaba) | --model qwen3-next:80b-cloud |
General purpose powerhouse |
| Qwen3 Coder Next | --model qwen3-coder-next:cloud |
Latest Qwen coder variant |
| Nemotron 3 Super (NVIDIA) | --model nemotron-3-super:cloud |
NVIDIA's flagship reasoning model |
| Ministral 3 (Mistral) | --model ministral-3:8b-cloud |
Efficient, fast, multilingual |
| RNJ-1 (Essential AI) | --model rnj-1:8b-cloud |
Lightweight and capable |
Tip: Start with
gemma3:27b-cloudorgpt-oss:20b-cloud— both responded instantly in testing.
Example Session
# Launch OpenCode with Google's Gemma3 27B — free, no install needed
ollama launch opencode --model gemma3:27b-cloud
OpenCode v1.x — powered by gemma3:27b-cloud
Type your task or press Ctrl+C to exit.
> Refactor this Python function to be async and add error handling
◆ Reading your codebase...
◆ Generating solution...
[gemma3:27b-cloud] Here's the refactored version:
...
What About the Pro-Only Models?
Some of the most capable frontier models require an Ollama Pro subscription. You'll get a 403 Forbidden if you try them without one:
| Model | Tier |
|---|---|
| DeepSeek V4 Pro (1.6T MoE) | ❌ Pro only |
| Qwen3.5 Cloud | ❌ Pro only |
| Kimi K2.6 (multimodal agentic) | ❌ Pro only |
| GLM-5.1 (SOTA SWE-Bench) | ❌ Pro only |
| Mistral Large 3 (675B) | ❌ Pro only |
| Gemini 3 Flash Preview | ❌ Pro only |
These are genuinely frontier-class models. If you find the free tier useful, it's worth checking out Ollama Pro to unlock them.
Why This Matters
| Traditional Setup | Ollama Cloud | |
|---|---|---|
| Hardware | GPU required | Any machine |
| Disk space | 5–290 GB per model | 0 GB |
| Setup time | Minutes to hours | Seconds |
| Cost | Hardware + electricity | Free |
| Model size | Limited by your VRAM | Up to 480B parameters |
Running Qwen3 Coder 480B locally would require ~290 GB of disk and multiple high-end GPUs. Via Ollama Cloud? One command, zero setup.
Quick Reference
# Coding-focused (recommended for OpenCode)
ollama launch opencode --model qwen3-coder:480b-cloud
ollama launch opencode --model devstral-small-2:24b-cloud
ollama launch opencode --model gpt-oss:20b-cloud
# General purpose powerhouses
ollama launch opencode --model gemma3:27b-cloud
ollama launch opencode --model glm-4.7:cloud
ollama launch opencode --model nemotron-3-super:cloud
# Lightweight and fast
ollama launch opencode --model gemma3:4b-cloud
ollama launch opencode --model ministral-3:8b-cloud
ollama launch opencode --model rnj-1:8b-cloud
Final Thought
The AI infrastructure barrier is quietly disappearing. You don't need a $10,000 GPU cluster or a pricey API subscription to run capable, large-scale models anymore. With Ollama Cloud and OpenCode, a curl command and an npm install is all that stands between you and a 480-billion-parameter coding assistant.
No GPU. No subscription. No excuses.
Models tested directly via ollama run <model> "hello" — May 10, 2026. Free tier availability may change. Check ollama.com/search?c=cloud for the latest.
Top comments (0)