DEV Community

Cover image for From Blurry Whiteboard Photo to Structured Action Items with Gemma 4 E4B
sameer zubair
sameer zubair

Posted on

From Blurry Whiteboard Photo to Structured Action Items with Gemma 4 E4B

Gemma 4 Challenge: Build With Gemma 4 Submission

What I Built

WhiteboardIQ — snap a photo of any whiteboard and get back a clean, structured list of action items, owners, deadlines, and priorities in seconds.

Every team has been there: 45 minutes of productive planning, three whiteboards full of tasks and names, then someone takes a blurry phone photo and that's "the notes." Two days later nobody remembers who owned what.

WhiteboardIQ fixes that. It reads the whiteboard image with Gemma 4's native vision and returns:

  • Action items with owner, deadline, and priority (inferred from visual cues — circles = High, boxes = Medium, plain text = Low)
  • 🏛️ Decisions made during the session
  • Open questions and blockers
  • 📋 Full verbatim transcription of the whiteboard
  • 📝 2–3 sentence executive summary

Export as JSON, Markdown, or CSV — paste straight into Notion, Confluence, or a spreadsheet.

Three ways to use it:

Platform Stack
Web app FastAPI + drag-and-drop UI. Gemma 4 via Ollama — no API key, fully offline
Edge Gallery skill Install into Google AI Edge Gallery. Gemma 4 reads and structures whiteboards inline

Demo

Web app — drop a photo, get action items in ~8 seconds:


bash
# Prerequisites: Ollama running with Gemma 4
ollama pull gemma4:e4b

cd whiteboardiq/backend
pip install -r requirements.txt
uvicorn main:app --reload
# Open http://127.0.0.1:8000
Edge Gallery skill — install in 10 seconds:

Open Edge Gallery → Agent → Skills → +

Gemma reads the board, extracts every task, and renders a live card with priority badges, owners, and deadlines — on-device, no internet required.

Install locally on iPhone (no URL needed):

AirDrop the whiteboardiq-skill/ folder to iPhone
Unzip in Files app
Edge Gallery → Skills → + → Import from file → select the folder


🔗 Web app + backend: github.com/samirzubair/GEMMA4

🔗 Edge Gallery skill: samirzubair.github.io/GEMMA4/SKILL.md

Project structure:

whiteboardiq/
├── backend/
│   ├── main.py        # FastAPI — POST /extract, serves frontend
│   ├── model.py       # Gemma 4 via Ollama REST API (no SDK needed)
│   └── formatter.py   # JSON → Markdown / CSV
└── frontend/
    ├── index.html     # Drag-and-drop upload UI
    ├── style.css      # Dark-mode design system
    └── app.js         # Fetch, render, copy, download
whiteboardiq-skill/    # Google AI Edge Gallery skill
├── SKILL.md           # Skill instructions for Gemma 4
├── scripts/
│   └── index.html     # run_js entry point — relays data to webview
└── assets/
    └── webview.html   # Renders action items card in-app
The Gemma integration — no SDK, just Ollama REST:

def extract_from_image_bytes(image_bytes: bytes, mime_type="image/jpeg") -> dict:
    payload = {
        "model": "gemma4:e4b",
        "prompt": EXTRACTION_PROMPT,
        "images": [base64.b64encode(image_bytes).decode()],
        "stream": False,
        "options": {"temperature": 0.2, "num_predict": 4096},
    }
    req = urllib.request.Request(
        "http://localhost:11434/api/generate",
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=120) as resp:
        return parse_json(json.loads(resp.read())["response"])
temperature: 0.2 keeps extraction grounded — higher values caused the model to hallucinate owners or deadlines not visible on the board.

The Edge Gallery skill
The skill uses Gemma 4's agent mode:

User sends whiteboard photo in Edge Gallery chat
Gemma reads the image with native vision
Gemma calls run_js with structured JSON (action items, decisions, questions)
scripts/index.html passes data to assets/webview.html via URL params
A dark-mode card renders inline with priority badges, owner chips, and deadlines
## Instructions (from SKILL.md)

Call the `run_js` tool using `index.html` and a JSON string for `data` with:
- action_items: Array with task, owner, deadline, priority, notes
- decisions: Array of strings
- questions: Array of strings
- meeting_context, summary


**The bigger picture: what local Gemma 4 means for enterprise AI**
Most multimodal AI tools have a quiet asterisk: your data goes to our servers.

For consumer apps that's fine. For enterprise — where whiteboards contain roadmaps, hiring decisions, financial forecasts, and unreleased product names — it's often a dealbreaker. Legal reviews it, security blocks it, and the tool never ships internally.

Gemma 4 E4B changes that equation. An 8B parameter multimodal model that runs in real-time on a laptop, fits on a phone, reads handwriting, understands context, and produces structured output — fully offline — is a fundamentally different proposition than a cloud API.

WhiteboardIQ is a small demonstration of that shift. The whiteboard use case is deliberately mundane. That's the point. If Gemma 4 can turn a blurry meeting photo into a structured JIRA-ready action list in 8 seconds on consumer hardware, the question isn't "what else can it do?" — the question is "what's left that it can't?"

![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2mhxfjm54500n79mie4g.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/skitrnx37osc279e0ss8.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ivcaegghh91j0oq2nmms.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ect6upsfvr64tce7yzdb.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nyr8f97mbytforwvjyru.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oqh6puqsm512ye8y9rvr.PNG)
![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vyy1gbguwqdqtaeybjof.PNG)


you tube link 

  
Enter fullscreen mode Exit fullscreen mode

Top comments (0)