DEV Community

Cover image for Old PC vs New AI: Can a 2015 Desktop Actually Run Gemma 4? (2B vs 4B Benchmark)

Old PC vs New AI: Can a 2015 Desktop Actually Run Gemma 4? (2B vs 4B Benchmark)

Daniel Balcarek on May 14, 2026

Running modern AI models locally on older hardware sounds almost impossible. But with smaller models like Gemma 4 and tools like Ollama, local AI i...
Collapse
 
webdeveloperhyper profile image
Web Developer Hyper

Good test of local LLMs. 😀 I wanted to use AI for free, so I tried local LLMs last year, but they were quite slow and low quality. My CPU and memory usage hit 100%, so I gave up. But they might be better now.

Collapse
 
gramli profile image
Daniel Balcarek

Definitely give it another try. I’m pretty sure you have better hardware than my archaic PC, so the E4B model should run fine for you (maybe even 26B 😀).

The reasoning quality also depends on your expectations. E4B and E2B are still relatively small models, so they won’t compete with models like Anthropic Claude Sonnet 4.6 or Google Gemini 3.1 for programming tasks, but they’re definitely usable.

Collapse
 
webdeveloperhyper profile image
Web Developer Hyper

Yes, I might try simple tasks with local LLMs and avoid comparing them with Claude Code. There should be a good way to make the most of local LLMs. 🤔

Collapse
 
max_quimby profile image
Max Quimby

The 2015-desktop angle is the more interesting half of the local-AI story right now. The "1x H100" crowd gets all the airtime, but the actual unlock for hobbyist devs is that a CPU-only or modest-iGPU machine can now run a model that's genuinely useful for code-completion or summarization workloads.

Two things I'd be curious to see in a follow-up: tokens/sec under sustained load rather than first-token (thermal throttling on old desktops is brutal once you get past the first minute), and whether you saw a meaningful quality difference between 2B and 4B on tasks that matter to you, not just benchmark scores. In our testing the 2B-vs-4B gap is small on classification and pretty large on anything requiring two-step reasoning, but it's very task-dependent.

Did you try llamafile or just stick with one runtime? llamafile's been surprising on old AVX2-only CPUs.

Collapse
 
gramli profile image
Daniel Balcarek

Thanks! I completely agree. The fact that older hardware can now run actually useful local models is probably the most exciting part for hobby developers right now.

And jumping to your last question, I stuck with Ollama mainly because it’s much more approachable for tech people in general, not just developers. But I’ll definitely try llamafile, especially once I start integrating models into the app I’m planning to build.

These are actually really good insights, and I’d like to focus more on them in follow-up testing. Right now I’m planning to evolve the measurements in two directions:

  • trying the models directly in VS Code Copilot Chat
  • using them inside an application where the model is part of the core functionality, while orchestration is handled by the backend

If E2B or E4B prove capable enough, I’d also like to experiment with MCP, RAG, and similar integrations to see how far they can be pushed.

Collapse
 
sylwia-lask profile image
Sylwia Laskowska

Wow, this is such an amazing breakdown 😄 I also wanted to participate in this contest, but now I’m honestly a bit embarrassed after reading this 😀

Local LLMs have always tempted me too. I experimented a bit with browser-based ones, but on a real computer you can definitely feel the difference in quality/performance.

Also, I find it fascinating that it struggles so much with Czech 😄 Such a beautiful language! 😄

Collapse
 
gramli profile image
Daniel Balcarek

Thank you! You should definitely go for it. These challenges are a great way to push our knowledge further.

That actually sounds like a challenge article idea now: “Gemma E2B in the browser?” 😄

And to be honest, I’m not surprised it struggles with Czech. It’s my native language and even I struggle with it sometimes 😀

Collapse
 
sylwia-lask profile image
Sylwia Laskowska

I love "szukajmy szczotek" in Czech, which are totally neutral words in Polish 🤣

Thread Thread
 
gramli profile image
Daniel Balcarek

Yep, generally “szukaj” just sounds funny to Czech speakers 😄

Collapse
 
ben profile image
Ben Halpern

Fascinating

Collapse
 
gramli profile image
Daniel Balcarek

Glad you found it fascinating!

Hopefully the fascinating part is the article itself, not the fact that I’m still developing side projects on a machine from 2015 😄

Collapse
 
syedahmershah profile image
Syed Ahmer Shah

It’s rare to see someone testing the 2015 hardware vs. modern LLM threshold so thoroughly.

Collapse
 
gramli profile image
Daniel Balcarek

2015 hardware might be a bit too old, but I believe a lot of people are still on older machines (around 2020 or earlier), so this kind of testing can be quite valuable for them.
I’m curious how far current small models can realistically go before hardware becomes the real bottleneck.

Collapse
 
harsh2644 profile image
Harsh

Wao amazing breakdown 😄

Collapse
 
gramli profile image
Daniel Balcarek

Thanks, glad you liked it! 😄

Collapse
 
simba_pumba profile image
Michael Zhu

Awesome, this is a real turning point.

Collapse
 
gramli profile image
Daniel Balcarek

Yes, it’s exciting that we can finally run useful models locally. I’m curious how far we can push these edge models.

Collapse
 
simba_pumba profile image
Michael Zhu

I think it's of no use to be like that.