Running modern AI models locally on older hardware sounds almost impossible. But with smaller models like Gemma 4 and tools like Ollama, local AI i...
For further actions, you may consider blocking this person and/or reporting abuse
Good test of local LLMs. 😀 I wanted to use AI for free, so I tried local LLMs last year, but they were quite slow and low quality. My CPU and memory usage hit 100%, so I gave up. But they might be better now.
Definitely give it another try. I’m pretty sure you have better hardware than my archaic PC, so the E4B model should run fine for you (maybe even 26B 😀).
The reasoning quality also depends on your expectations. E4B and E2B are still relatively small models, so they won’t compete with models like Anthropic Claude Sonnet 4.6 or Google Gemini 3.1 for programming tasks, but they’re definitely usable.
Yes, I might try simple tasks with local LLMs and avoid comparing them with Claude Code. There should be a good way to make the most of local LLMs. 🤔
The 2015-desktop angle is the more interesting half of the local-AI story right now. The "1x H100" crowd gets all the airtime, but the actual unlock for hobbyist devs is that a CPU-only or modest-iGPU machine can now run a model that's genuinely useful for code-completion or summarization workloads.
Two things I'd be curious to see in a follow-up: tokens/sec under sustained load rather than first-token (thermal throttling on old desktops is brutal once you get past the first minute), and whether you saw a meaningful quality difference between 2B and 4B on tasks that matter to you, not just benchmark scores. In our testing the 2B-vs-4B gap is small on classification and pretty large on anything requiring two-step reasoning, but it's very task-dependent.
Did you try llamafile or just stick with one runtime? llamafile's been surprising on old AVX2-only CPUs.
Thanks! I completely agree. The fact that older hardware can now run actually useful local models is probably the most exciting part for hobby developers right now.
And jumping to your last question, I stuck with Ollama mainly because it’s much more approachable for tech people in general, not just developers. But I’ll definitely try llamafile, especially once I start integrating models into the app I’m planning to build.
These are actually really good insights, and I’d like to focus more on them in follow-up testing. Right now I’m planning to evolve the measurements in two directions:
If E2B or E4B prove capable enough, I’d also like to experiment with MCP, RAG, and similar integrations to see how far they can be pushed.
Wow, this is such an amazing breakdown 😄 I also wanted to participate in this contest, but now I’m honestly a bit embarrassed after reading this 😀
Local LLMs have always tempted me too. I experimented a bit with browser-based ones, but on a real computer you can definitely feel the difference in quality/performance.
Also, I find it fascinating that it struggles so much with Czech 😄 Such a beautiful language! 😄
Thank you! You should definitely go for it. These challenges are a great way to push our knowledge further.
That actually sounds like a challenge article idea now: “Gemma E2B in the browser?” 😄
And to be honest, I’m not surprised it struggles with Czech. It’s my native language and even I struggle with it sometimes 😀
I love "szukajmy szczotek" in Czech, which are totally neutral words in Polish 🤣
Yep, generally “szukaj” just sounds funny to Czech speakers 😄
Fascinating
Glad you found it fascinating!
Hopefully the fascinating part is the article itself, not the fact that I’m still developing side projects on a machine from 2015 😄
It’s rare to see someone testing the 2015 hardware vs. modern LLM threshold so thoroughly.
2015 hardware might be a bit too old, but I believe a lot of people are still on older machines (around 2020 or earlier), so this kind of testing can be quite valuable for them.
I’m curious how far current small models can realistically go before hardware becomes the real bottleneck.
Wao amazing breakdown 😄
Thanks, glad you liked it! 😄
Awesome, this is a real turning point.
Yes, it’s exciting that we can finally run useful models locally. I’m curious how far we can push these edge models.
I think it's of no use to be like that.