Joske Vermeulen

Posted on May 11

responseJsonSchema: The Undocumented Gemma 4 Feature That Changed Everything

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

When I started building Codebase Dungeon: a game that turns GitHub repos into playable dungeons: I hit a wall immediately.

Gemma 4 31B on Google AI Studio has a "thinking" behavior. Even with responseMimeType: 'application/json', the model outputs internal reasoning before the actual JSON:

*   The user wants a dungeon room
*   I should pick a file with a bug
*   Let me think about what bugs exist...

{"name": "The Auth Chamber", ...}

This consumed output tokens, made parsing unreliable, and sometimes the model ran out of tokens before even writing the JSON.

What I Tried (And Failed)

responseMimeType: 'application/json': Gemma ignores it, still thinks first
"Output ONLY JSON" in prompt: Gemma thinks about outputting JSON, then doesn't
Prefill trick (start response with {): Gemma continues thinking instead
Lower temperature: No effect on thinking behavior
Two-turn approach: Still thinks in the second turn
Pipe-delimited text format: Worked but ugly, limited structure

I was about to give up on structured output entirely.

The Discovery: responseJsonSchema

Then I found it: responseJsonSchema in the Gemini API's generation config:

generationConfig: {
  responseMimeType: 'application/json',
  responseJsonSchema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      bugDescription: { type: 'string' },
      correctFix: { type: 'string' },
      // ... full schema
    },
    required: ['name', 'bugDescription', 'correctFix']
  }
}

The key: you must provide BOTH responseMimeType AND responseJsonSchema with a complete schema definition. Without the schema, Gemma ignores the mime type. With it, output is perfect: no thinking, no markdown, just clean JSON.

This solves the problem that dozens of developers are struggling with in the forums. The common suggestions (thinkingLevel: "MINIMAL", regex stripping, include_thoughts: false) either don't work or don't guarantee structured output. responseJsonSchema does both: it bypasses thinking AND enforces structure.

The feature is documented for Gemini models, but the official Gemma 4 capabilities page doesn't list it. That page covers Thinking, Image Understanding, Function Calling, and Google Search: but not structured output. Yet it works perfectly with Gemma 4 31B through the same Gemini API infrastructure.

Why This Matters

Without responseJsonSchema	With responseJsonSchema
~50% parse success rate	99%+ parse success rate
140+ wasted "thinking" tokens	Zero wasted tokens
Needs 8192 maxOutputTokens	800 tokens is enough
Requires complex fallback parsing	Simple `JSON.parse()`

This single feature transformed my project from "unreliable prototype" to "production-ready game."

Combining With Multimodal: Design Comprehension

The real power: responseJsonSchema works with multimodal inputs too. I send Gemma 4 both source code AND an app screenshot:

const contents = [{
  role: 'user',
  parts: [
    { text: prompt },
    { inlineData: { mimeType: 'image/png', data: screenshotBase64 } }
  ]
}];

const res = await fetch(GEMMA_API_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    contents,
    generationConfig: {
      responseMimeType: 'application/json',
      responseJsonSchema: ROOM_SCHEMA,
      maxOutputTokens: 800
    }
  })
});

const data = await res.json();
// Clean, structured JSON: every time

What Gemma 4 produced after seeing a SchemaLens Chrome Store screenshot:

"You step into a dim, cavernous room where two massive stone tablets-Schema A and Schema B-loom before you. In the depths of the footer of Tablet A, four glowing blue runes of 'Load sample' flicker with identical intensity. Across the gap, in the footer of Tablet B, a lone rune 'Copy from A & modify' pulses with a pale, spectral lilac hue, clashing with the bold violet of the 'Compare Schemas' altar above."

This isn't color detection. Gemma identified specific UI elements by name, recognized their styling inconsistencies, and turned it into a playable UX challenge: all in perfectly structured JSON.

The 128K Context Advantage

With reliable structured output solved, I could push Gemma 4's other unique feature: the 128K context window.

I feed entire repositories into a single request: full file contents, not snippets. Gemma reads the complete codebase and finds cross-file bugs that only exist because of how files interact:

"The getAuthedClient function in auth.js is defined but never called in export.js: the endpoint is completely unprotected."

No 8K-context model can do this. You need the full codebase in one prompt.

The Architecture This Enabled

Because responseJsonSchema guarantees structured output, I could pre-generate everything:

Generation phase (~15-30s): Gemma analyzes code + screenshots, outputs structured rooms with narratives, choices, correct answers, and victory text
Gameplay phase (instant): Zero API calls. All narratives pre-computed. Deterministic scoring. The game runs on pure pre-generated data.

This means:

Cached repos load in <1 second
Gameplay is instant (0ms per action)
Cost per dungeon: ~$0.005 (18x cheaper than GPT-4o for equivalent capability)
Cost during gameplay: $0

Practical Tips for Developers

If you're building with Gemma 4 31B on Google AI Studio:

Always use responseJsonSchema: it's the difference between 50% and 99% reliability
Put all fields in required: optional fields often get skipped
Use non-streaming for structured output: streaming + schema can truncate responses
Temperature 0.6 for structured data, 0.8+ for creative text
The paid tier is required: free tier returns "Internal error" with schemas
Multimodal + schema works: but use non-streaming (the combination is unreliable with streaming)
Don't fight the thinking: with responseJsonSchema, there is no thinking. Without it, you can't stop it.

What Gemma 4 Unlocked

Before responseJsonSchema: I was building a fragile prototype with regex parsing and 50% failure rates.

After: I built a fully playable game where Gemma 4 generates entire dungeons from real codebases: with multimodal vision, 128K context, and perfect structured output. The game produces a downloadable code review report that's genuinely useful: real bugs, real fixes, real file locations.

The model is capable. The documentation just hasn't caught up yet.

DEV Community