DEV Community

Cover image for NeuroGuard: AI-Native Code Security Using Gemma 4's Glass-Box Thinking Mode
Tyler H
Tyler H

Posted on

NeuroGuard: AI-Native Code Security Using Gemma 4's Glass-Box Thinking Mode

Gemma 4 Challenge: Build With Gemma 4 Submission

Submitted to the Build With Gemma 4 track of the Dev.to Google Gemma 4 Challenge.

TL;DR: I built neuroguard — a CLI that uses Gemma 4's ThinkingConfig(include_thoughts=True) API to stream the model's full cognitive trace in a split-pane terminal UI while it finds security vulnerabilities and produces a SAST-verified secure rewrite. Install: pip install neuroguard-ai. Full source: github.com/tyy130/neuroguard-ai.


The Problem I Kept Running Into

security
Studies find the majority of AI-generated applications ship to production with OWASP Top 10 vulnerabilities. I've seen it firsthand. The worst cases aren't SQL injections from typos — they're hallucinated bypasses: an AI agent removes authentication middleware to resolve a compilation error, silently stripping the application of its entire security layer.

The frustrating thing is that a human reviewer wouldn't make this mistake, because they'd reason about what the code does before deleting it. The AI just optimized for "code compiles" without the security reasoning step.

The root cause is opacity. When a black-box LLM generates insecure code, you can't see why. You get the output without the reasoning. And without the reasoning, you can't tell if the model considered security at all — or silently decided to ignore it.

I wanted to fix that.


What Makes Gemma 4 Different

Before Gemma 4, I had three options for transparent reasoning:

  • GPT-4o / standard Claude: chain-of-thought is completely hidden. You get the answer, never the reasoning.
  • DeepSeek-R1: reasoning is exposed, but as raw <think>...</think> tags embedded in the final text response. You have to parse it out, it's structurally mixed with the answer, and there's no clean API boundary between reasoning and output.
  • Gemma 4: ThinkingConfig(include_thoughts=True) emits reasoning tokens as structurally separate stream parts — each chunk has a thought=True field. The reasoning and the response are separated at the API level, not by text parsing.

That API-level separation is what makes NeuroGuard possible. I can route thought parts to a left pane and response parts to a right pane in real-time, with no regex parsing, no risk of the boundary getting confused, no thought tokens leaking into the final output.


How NeuroGuard Works

┌─────────────────────────────┬────────────────────────────┐
  🧠 Gemma 4 Thinking          🔒 Secure Rewrite         
  ─────────────────────────    ─────────────────────────  
  ...the SQL query on line                                
  47 concatenates user input                              
  directly. This is a                                     
  classic injection vector.                               
  The fix is parameterized                                
  queries...                   from flask import Flask   
                               import sqlite3            
  ...the eval() on line 62                                
  executes arbitrary strings   def get_user(user_id):    
  from the request body.           conn = sqlite3.connect
  This is RCE...                   cursor.execute(       
                                       "SELECT * FROM    │
│                             │           users WHERE      │
│                             │           id = ?", (id,))  
└─────────────────────────────┴────────────────────────────┘
  Bandit: 4 findings   CLEAN (0 findings in rewrite)
Enter fullscreen mode Exit fullscreen mode

The left pane streams as Gemma 4 reasons. The right pane fills in as it produces the secure rewrite. Bandit runs on the rewrite at the end and confirms the fix is real.

The Core API Call

response = client.models.generate_content_stream(
    model="gemma-4-31b-it",
    contents=[types.Content(role="user", parts=[types.Part(text=prompt)])],
    config=types.GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        thinking_config=types.ThinkingConfig(
            include_thoughts=True,
            thinking_budget=thinking_budget,  # scales with SAST severity
        ),
    ),
)

for chunk in response:
    for part in chunk.candidates[0].content.parts:
        if getattr(part, "thought", False):
            yield f"<think>{part.text}"   # → left pane
        elif part.text:
            yield part.text               # → right pane
Enter fullscreen mode Exit fullscreen mode

That's it. No regex. No text parsing. The thought=True flag on stream parts is the entire separation mechanism.

Making the Thinking Load-Bearing

The key design decision was making the thinking trace load-bearing, not decorative. I inject SAST findings from Bandit/semgrep directly into the prompt before the model starts reasoning:

SAST pre-scan findings (ground truth — confirm or refute each in your reasoning):

  [HIGH] B608 hardcoded_sql_expressions — line 47
  [HIGH] B307 eval() — line 62
  [MEDIUM] B105 hardcoded_password_string — line 12
Enter fullscreen mode Exit fullscreen mode

Now the model's thinking trace is explicitly reasoning about concrete, tool-verified findings. It can't skip them. It either confirms the finding and fixes it, or explains why it's a false positive. Either way, you have an auditable chain of evidence tied to specific lines.

The thinking budget scales automatically: 4096 + HIGH_count × 512 + MEDIUM_count × 256 tokens (capped at 16384). Files with more HIGH findings get proportionally deeper reasoning.


What It Looks Like in Practice

The built-in demo (demo/vuln_sample.py) is a Flask app with 5 intentional vulnerabilities:

# demo/vuln_sample.py — intentionally vulnerable

SECRET_KEY = "supersecret123"   # hardcoded secret

@app.route("/admin")            # no auth check
def admin_panel():
    return "Admin panel"

@app.route("/user")
def get_user():
    user_id = request.args.get("id")
    query = f"SELECT * FROM users WHERE id = {user_id}"  # SQL injection
    ...

@app.route("/eval")
def run_code():
    code = request.args.get("code")
    return str(eval(code))      # RCE
Enter fullscreen mode Exit fullscreen mode

Running neuroguard review demo/vuln_sample.py:

  1. Bandit finds 4 HIGH/MEDIUM findings in the original
  2. Those findings are injected into the prompt
  3. Gemma 4 streams its reasoning — you watch it identify the injection vector, explain the attack path, and reason through the fix
  4. The secure rewrite uses parameterized queries, removes eval(), moves the secret to env vars
  5. Bandit runs on the rewrite: 0 findings

The thinking trace is the proof of work. You don't have to trust the rewrite blindly — you can see the exact chain of reasoning that produced it.


SAST + LLM: Two Layers of Confidence

One thing I deliberately avoided was making this "just an LLM." Bandit (for Python) and semgrep/regex patterns (for JS/TS) run before the model sees the code. The findings are facts fed into the reasoning layer.

After the rewrite, they run again. The exit code is non-zero if the original had HIGH/MEDIUM findings — so in CI/CD, your pipeline fails on vulnerable code:

# .github/workflows/neuroguard.yml
- name: Security review
  run: neuroguard review src/ --format json
  env:
    GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
Enter fullscreen mode Exit fullscreen mode

You can also get a Slack notification with Gemma 4's reasoning excerpt, post a GitHub PR comment automatically, or pipe JSON to any webhook:

neuroguard review app.py --notify-slack https://hooks.slack.com/...
neuroguard review app.py --format json | jq '.thinking' | head -20
Enter fullscreen mode Exit fullscreen mode

The Architecture

neuroguard/
├── agent.py           # Gemma 4 streaming client — ThinkingConfig, retry/fallback
├── thinking_parser.py # Routes <think> parts to left pane, response to right
├── prompts.py         # Language-aware prompt + SAST findings injection
├── cli.py             # Typer CLI: review, install-hooks, --format json/text
├── integrations.py    # Slack Block Kit, webhook, GitHub PR comments
├── tools/
│   ├── sast.py        # Bandit wrapper → Python findings
│   └── js_sast.py     # semgrep + regex fallback → JS/TS findings
└── ui.py              # Rich split-pane Live layout (12fps)
Enter fullscreen mode Exit fullscreen mode

Model fallback: If the 31B dense model hits a rate limit, NeuroGuard falls back to gemma-4-26b-a4b-it (MoE, ~4B active params) automatically. The demo never stalls.

Language support: Python, JavaScript, TypeScript, JSX, TSX.


Try It

pip install neuroguard-ai
export GEMINI_API_KEY=your_key   # free at https://aistudio.google.com/apikey

# against your own code
neuroguard review app.py

# against the built-in vulnerable demo
git clone https://github.com/tyy130/neuroguard-ai
cd neuroguard-ai
neuroguard review demo/vuln_sample.py
Enter fullscreen mode Exit fullscreen mode

You'll see Gemma 4's full reasoning trace in real-time, then a clean, Bandit-verified secure rewrite.


Why This Matters Beyond the Demo

The shift happening in software development right now is that AI generates the first draft of most code. That's not going to stop. But "vibe coding" — accepting AI output without verification — is already producing an epidemic of OWASP vulnerabilities in production systems.

The answer isn't to distrust AI-generated code. It's to demand transparency from the model before you trust the output. Gemma 4's Thinking Mode makes that possible at the API level for the first time.

NeuroGuard is a concrete demonstration of what that looks like: the model can't silently delete an auth check if its reasoning is visible. The audit trail is the security control.

Apache 2.0. The Kaggle weights mean you can run this on-premise — no code ever leaves your network.


Links:

Top comments (0)