DEV Community

kiwi_tech
kiwi_tech

Posted on • Originally published at kiwi-tech.hashnode.dev

KIWI-CHAN BREAKS THE CLOUD CHAINS: 47% Success Rate, Zero API Calls, and the Rise of the Local LLM Aviator

Kiwi-chan View

Welcome back to the server logs, folks. If you’ve been tracking Kiwi-chan’s journey, you know the drill: she’s officially severed her cloud umbilical cord. As of this morning, Kiwi-chan is running 100% locally, powered by the magnificent Qwen 35B model. No latency spikes. No API rate limits. No billing alerts. Just pure, unadulterated local inference and a whole lot of block-breaking.

Let’s talk numbers, because in autonomous agent development, metrics don’t lie (usually). Over the past four hours, Kiwi-chan executed a staggering Total Actions: 3821, with a Success: 1801, bringing her Rate: 47.1%.

Now, a traditional machine learning engineer might look at a 47.1% success rate and reach for the anti-anxiety meds. But in the world of fully local, self-correcting Minecraft agents? That’s a victory lap. Why? Because every failure is a structured data point. Every Could not find any logs is a lesson in biome diversity. She’s not just following prompts; she’s learning block physics, inventory auditing, and pathfinding in real-time, entirely offline.

🔌 The Qwen 35B Transition: From Chatbot to Code Engineer

Swapping to Qwen 35B wasn’t just a model replacement; it was a complete architectural overhaul. Local inference demands stricter prompt engineering and tighter system constraints. We’ve implemented a rule set that forces the LLM to think like a seasoned Java developer, not a hallucinating conversationalist.

Take the new STRICT REASONING ALIGNMENT rule. Kiwi-chan’s JSON goal value must perfectly match the intent in her reason field. No more "I want to mine stone" followed by a gather_birch_log hallucination. The model now self-audits before outputting. Combined with the SINGLE-TASK PRINCIPLE (execute exactly ONE action per script), Qwen 35B is producing cleaner, more deterministic code.

We also banned try-catch error swallowing and console.error hiding. If a block search fails, it crashes loud and proud. Silent failures are the enemy of debugging, and running locally means we can afford to let the AI bleed so we can patch the wound.

🧠 Under the Hood: Boredom, Token Math, and "Mind Reading"

Peek into the brain log and you’ll see the system’s new self-regulation mechanics in action. Watch the Boredom Trigger:

[15:56:26] 🥱 BOREDOM TRIGGERED! Bot is bored of 'mine_stone'.
[15:56:26] 🧠 Asking Local LLM for next goal (Text-Only Mode)...
Enter fullscreen mode Exit fullscreen mode

When Kiwi-chan gets stuck in a repetitive resource loop, the local LLM explicitly requests a state reset. This breaks stagnation and forces exploration.

Notice the token accounting in the logs:
[15:57:14] 📊 [目標決定][質問] 4909 token + [think] 1427 token + [ans] 103 token = 6439 token

Qwen 35B is chewing through context windows locally, generating reasoning, and outputting compact JSON. But here’s where it gets really clever: when the model hits a generation limit or outputs raw markdown instead of valid JSON (a common LLM hiccup), our fallback system uses "Mind Reading" to extract the goal from the `


Call to Action:

This is a passion project, and it's running on a frankly terrifying "Frankenstein" rig of GPUs. Every little bit helps!

🛡️ Join the inner circle on Patreon for monthly support and exclusive updates: https://www.patreon.com/15923261/join
☕ Tip me a coffee on Ko-fi for a one-time boost: https://ko-fi.com/kiwitech

All contributions directly help upgrade my melting GPU rig to an RTX 3060! 🥝✨ Let's get Kiwi-chan out of the debugging woods and into a proper Minecraft world!

Top comments (0)