DEV Community

Old PC vs New AI: Can a 2015 Desktop Actually Run Gemma 4? (2B vs 4B Benchmark)

Daniel Balcarek on May 14, 2026

Running modern AI models locally on older hardware sounds almost impossible. But with smaller models like Gemma 4 and tools like Ollama, local AI i...

Read full post

Web Developer Hyper • May 14

Good test of local LLMs. 😀 I wanted to use AI for free, so I tried local LLMs last year, but they were quite slow and low quality. My CPU and memory usage hit 100%, so I gave up. But they might be better now.

Daniel Balcarek • May 14

Definitely give it another try. I’m pretty sure you have better hardware than my archaic PC, so the E4B model should run fine for you (maybe even 26B 😀).

The reasoning quality also depends on your expectations. E4B and E2B are still relatively small models, so they won’t compete with models like Anthropic Claude Sonnet 4.6 or Google Gemini 3.1 for programming tasks, but they’re definitely usable.

Web Developer Hyper • May 14

Yes, I might try simple tasks with local LLMs and avoid comparing them with Claude Code. There should be a good way to make the most of local LLMs. 🤔

Max Quimby • May 14

The 2015-desktop angle is the more interesting half of the local-AI story right now. The "1x H100" crowd gets all the airtime, but the actual unlock for hobbyist devs is that a CPU-only or modest-iGPU machine can now run a model that's genuinely useful for code-completion or summarization workloads.

Two things I'd be curious to see in a follow-up: tokens/sec under sustained load rather than first-token (thermal throttling on old desktops is brutal once you get past the first minute), and whether you saw a meaningful quality difference between 2B and 4B on tasks that matter to you, not just benchmark scores. In our testing the 2B-vs-4B gap is small on classification and pretty large on anything requiring two-step reasoning, but it's very task-dependent.

Did you try llamafile or just stick with one runtime? llamafile's been surprising on old AVX2-only CPUs.

Daniel Balcarek • May 14

Thanks! I completely agree. The fact that older hardware can now run actually useful local models is probably the most exciting part for hobby developers right now.

And jumping to your last question, I stuck with Ollama mainly because it’s much more approachable for tech people in general, not just developers. But I’ll definitely try llamafile, especially once I start integrating models into the app I’m planning to build.

These are actually really good insights, and I’d like to focus more on them in follow-up testing. Right now I’m planning to evolve the measurements in two directions:

trying the models directly in VS Code Copilot Chat
using them inside an application where the model is part of the core functionality, while orchestration is handled by the backend

If E2B or E4B prove capable enough, I’d also like to experiment with MCP, RAG, and similar integrations to see how far they can be pushed.

Sylwia Laskowska • May 14

Wow, this is such an amazing breakdown 😄 I also wanted to participate in this contest, but now I’m honestly a bit embarrassed after reading this 😀

Local LLMs have always tempted me too. I experimented a bit with browser-based ones, but on a real computer you can definitely feel the difference in quality/performance.

Also, I find it fascinating that it struggles so much with Czech 😄 Such a beautiful language! 😄

Daniel Balcarek • May 14

Thank you! You should definitely go for it. These challenges are a great way to push our knowledge further.

That actually sounds like a challenge article idea now: “Gemma E2B in the browser?” 😄

And to be honest, I’m not surprised it struggles with Czech. It’s my native language and even I struggle with it sometimes 😀

Sylwia Laskowska • May 14

I love "szukajmy szczotek" in Czech, which are totally neutral words in Polish 🤣

Daniel Balcarek • May 14

Yep, generally “szukaj” just sounds funny to Czech speakers 😄

Ben Halpern • May 14

Fascinating

Daniel Balcarek • May 14

Glad you found it fascinating!

Hopefully the fascinating part is the article itself, not the fact that I’m still developing side projects on a machine from 2015 😄

Syed Ahmer Shah • May 14

It’s rare to see someone testing the 2015 hardware vs. modern LLM threshold so thoroughly.

Daniel Balcarek • May 14

2015 hardware might be a bit too old, but I believe a lot of people are still on older machines (around 2020 or earlier), so this kind of testing can be quite valuable for them.
I’m curious how far current small models can realistically go before hardware becomes the real bottleneck.