NeuroGuard: AI-Native Code Security Using Gemma 4's Glass-Box Thinking Mode

#gemmachallenge #security #python #opensource

Gemma 4 Challenge: Build With Gemma 4 Submission

Submitted to the Build With Gemma 4 track of the Dev.to Google Gemma 4 Challenge.

TL;DR: I built neuroguard — a CLI that uses Gemma 4's ThinkingConfig(include_thoughts=True) API to stream the model's full cognitive trace in a split-pane terminal UI while it finds security vulnerabilities and produces a SAST-verified secure rewrite. Install: pip install neuroguard-ai. Full source: github.com/tyy130/neuroguard-ai.

The Problem I Kept Running Into

security
Studies find the majority of AI-generated applications ship to production with OWASP Top 10 vulnerabilities. I've seen it firsthand. The worst cases aren't SQL injections from typos — they're hallucinated bypasses: an AI agent removes authentication middleware to resolve a compilation error, silently stripping the application of its entire security layer.

The frustrating thing is that a human reviewer wouldn't make this mistake, because they'd reason about what the code does before deleting it. The AI just optimized for "code compiles" without the security reasoning step.

The root cause is opacity. When a black-box LLM generates insecure code, you can't see why. You get the output without the reasoning. And without the reasoning, you can't tell if the model considered security at all — or silently decided to ignore it.

I wanted to fix that.

What Makes Gemma 4 Different

Before Gemma 4, I had three options for transparent reasoning:

GPT-4o / standard Claude: chain-of-thought is completely hidden. You get the answer, never the reasoning.
DeepSeek-R1: reasoning is exposed, but as raw <think>...</think> tags embedded in the final text response. You have to parse it out, it's structurally mixed with the answer, and there's no clean API boundary between reasoning and output.
Gemma 4: ThinkingConfig(include_thoughts=True) emits reasoning tokens as structurally separate stream parts — each chunk has a thought=True field. The reasoning and the response are separated at the API level, not by text parsing.

That API-level separation is what makes NeuroGuard possible. I can route thought parts to a left pane and response parts to a right pane in real-time, with no regex parsing, no risk of the boundary getting confused, no thought tokens leaking into the final output.

How NeuroGuard Works

┌─────────────────────────────┬────────────────────────────┐
│  🧠 Gemma 4 Thinking        │  🔒 Secure Rewrite         │
│  ─────────────────────────  │  ─────────────────────────  │
│  ...the SQL query on line   │                             │
│  47 concatenates user input │                             │
│  directly. This is a        │                             │
│  classic injection vector.  │                             │
│  The fix is parameterized   │                             │
│  queries...                 │  from flask import Flask   │
│                             │  import sqlite3            │
│  ...the eval() on line 62   │                             │
│  executes arbitrary strings │  def get_user(user_id):    │
│  from the request body.     │      conn = sqlite3.connect│
│  This is RCE...             │      cursor.execute(       │
│                             │          "SELECT * FROM    │
│                             │           users WHERE      │
│                             │           id = ?", (id,))  │
└─────────────────────────────┴────────────────────────────┘
  Bandit: 4 findings → ✓ CLEAN (0 findings in rewrite)

The left pane streams as Gemma 4 reasons. The right pane fills in as it produces the secure rewrite. Bandit runs on the rewrite at the end and confirms the fix is real.

The Core API Call

response = client.models.generate_content_stream(
    model="gemma-4-31b-it",
    contents=[types.Content(role="user", parts=[types.Part(text=prompt)])],
    config=types.GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=thinking_budget,  # scales with SAST severity
        ),
    ),
)

for chunk in response:
    for part in chunk.candidates[0].content.parts:
        if getattr(part, "thought", False):
            yield f"<think>{part.text}"   # → left pane
        elif part.text:
            yield part.text               # → right pane

That's it. No regex. No text parsing. The thought=True flag on stream parts is the entire separation mechanism.

Making the Thinking Load-Bearing

The key design decision was making the thinking trace load-bearing, not decorative. I inject SAST findings from Bandit/semgrep directly into the prompt before the model starts reasoning:

SAST pre-scan findings (ground truth — confirm or refute each in your reasoning):

  [HIGH] B608 hardcoded_sql_expressions — line 47
  [HIGH] B307 eval() — line 62
  [MEDIUM] B105 hardcoded_password_string — line 12

Now the model's thinking trace is explicitly reasoning about concrete, tool-verified findings. It can't skip them. It either confirms the finding and fixes it, or explains why it's a false positive. Either way, you have an auditable chain of evidence tied to specific lines.

The thinking budget scales automatically: 4096 + HIGH_count × 512 + MEDIUM_count × 256 tokens (capped at 16384). Files with more HIGH findings get proportionally deeper reasoning.

What It Looks Like in Practice

The built-in demo (demo/vuln_sample.py) is a Flask app with 5 intentional vulnerabilities:

# demo/vuln_sample.py — intentionally vulnerable

SECRET_KEY = "supersecret123"   # hardcoded secret

@app.route("/admin")            # no auth check
def admin_panel():
    return "Admin panel"

@app.route("/user")
def get_user():
    user_id = request.args.get("id")
    query = f"SELECT * FROM users WHERE id = {user_id}"  # SQL injection
    ...

@app.route("/eval")
def run_code():
    code = request.args.get("code")
    return str(eval(code))      # RCE

Running neuroguard review demo/vuln_sample.py:

Bandit finds 4 HIGH/MEDIUM findings in the original
Those findings are injected into the prompt
Gemma 4 streams its reasoning — you watch it identify the injection vector, explain the attack path, and reason through the fix
The secure rewrite uses parameterized queries, removes eval(), moves the secret to env vars
Bandit runs on the rewrite: 0 findings

The thinking trace is the proof of work. You don't have to trust the rewrite blindly — you can see the exact chain of reasoning that produced it.

SAST + LLM: Two Layers of Confidence

One thing I deliberately avoided was making this "just an LLM." Bandit (for Python) and semgrep/regex patterns (for JS/TS) run before the model sees the code. The findings are facts fed into the reasoning layer.

After the rewrite, they run again. The exit code is non-zero if the original had HIGH/MEDIUM findings — so in CI/CD, your pipeline fails on vulnerable code:

# .github/workflows/neuroguard.yml
- name: Security review
  run: neuroguard review src/ --format json
  env:
    GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

You can also get a Slack notification with Gemma 4's reasoning excerpt, post a GitHub PR comment automatically, or pipe JSON to any webhook:

neuroguard review app.py --notify-slack https://hooks.slack.com/...
neuroguard review app.py --format json | jq '.thinking' | head -20

The Architecture

neuroguard/
├── agent.py           # Gemma 4 streaming client — ThinkingConfig, retry/fallback
├── thinking_parser.py # Routes <think> parts to left pane, response to right
├── prompts.py         # Language-aware prompt + SAST findings injection
├── cli.py             # Typer CLI: review, install-hooks, --format json/text
├── integrations.py    # Slack Block Kit, webhook, GitHub PR comments
├── tools/
│   ├── sast.py        # Bandit wrapper → Python findings
│   └── js_sast.py     # semgrep + regex fallback → JS/TS findings
└── ui.py              # Rich split-pane Live layout (12fps)

Model fallback: If the 31B dense model hits a rate limit, NeuroGuard falls back to gemma-4-26b-a4b-it (MoE, ~4B active params) automatically. The demo never stalls.

Language support: Python, JavaScript, TypeScript, JSX, TSX.

Try It

pip install neuroguard-ai
export GEMINI_API_KEY=your_key   # free at https://aistudio.google.com/apikey

# against your own code
neuroguard review app.py

# against the built-in vulnerable demo
git clone https://github.com/tyy130/neuroguard-ai
cd neuroguard-ai
neuroguard review demo/vuln_sample.py

You'll see Gemma 4's full reasoning trace in real-time, then a clean, Bandit-verified secure rewrite.

Why This Matters Beyond the Demo

The shift happening in software development right now is that AI generates the first draft of most code. That's not going to stop. But "vibe coding" — accepting AI output without verification — is already producing an epidemic of OWASP vulnerabilities in production systems.

The answer isn't to distrust AI-generated code. It's to demand transparency from the model before you trust the output. Gemma 4's Thinking Mode makes that possible at the API level for the first time.

NeuroGuard is a concrete demonstration of what that looks like: the model can't silently delete an auth check if its reasoning is visible. The audit trail is the security control.

Apache 2.0. The Kaggle weights mean you can run this on-premise — no code ever leaves your network.

Links: