DEV Community

Cover image for Local AI Assistant powered by Gemma 4
Florian Zielasko
Florian Zielasko

Posted on

Local AI Assistant powered by Gemma 4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Reiseki (霊石) is a local AI assistant. It uses Gemma 4 via Ollama to handle real tasks: reading and writing files, generating Word and PDF documents, lightweight data analysis, managing reminders, and remembering context across sessions.

The goal was to build something usable by people who have never touched a terminal. Reiseki ships as a Windows installer — after installing Ollama and a model, you pick a workspace folder, and the agent is ready. No Python environment, no config files, no command line.

Under the hood it runs a ReAct loop (Reason → Act → Observe) with 15+ tools, persistent conversation history in SQLite, a live tool trace in the UI, and a LAN access toggle for smartphone use via QR code.

Code & Demo

The full source code and a video are available on GitHub:
github.com/Flo1632/reiseki

The Windows installer (ReisekiSetup.exe) is available on the Releases page.

How I Used Gemma 4

Reiseki uses Gemma 4:e2b (the 2B edge model) as its default model via Ollama.

The choice was deliberate: Reiseki is built specifically for laptops and low-RAM devices. Most capable local models require significantly more resources or a dedicated GPU. Gemma 4:e2b is the first model I tested that handles multi-step tool calling reliably at this hardware tier — it follows the ReAct loop, uses tools correctly, and produces coherent responses.

Tools calls of Reiseki powered by Gemma 4 via Ollama

The combination of small footprint and reliable tool use made Gemma 4:e2b the right fit for an offline-first personal agent.

Top comments (3)

Collapse
 
tahosin profile image
S M Tahosin

Local first is the way to go. I built something similar but focused on vision instead of text, running object detection entirely on a Pi 5 with no cloud dependency. How are you handling the model loading time? On my setup the first inference takes about 15 seconds but after that it's much faster since the model stays cached in RAM.

Collapse
 
flo1632 profile image
Florian Zielasko

Hi S M Tahosin, Great question! I'm not sure how much data a vision model loads upfront, but in Reiseki I use a "warmup"-function that sends a silent 'hi' to Ollama in a background thread when the app starts. By the time the user sends their first real message, the model is already loaded into RAM.

Collapse
 
tahosin profile image
S M Tahosin

The warmup trick with a silent 'hi' is smart. It keeps the UX smooth without delay.
Really impressed by how user-friendly you've made Reiseki. Local-first agents like this are the future.
Your choice of Gemma 4 for reliable ReAct/tool use makes total sense. Might try something similar.