Visualizing AI Agency: Building a "Failure Lab" for LLM Tools

#ai #productivity #programming #dailybuild2026

Agentic AI is the next frontier, but it’s currently a black box. When an agent fails to book a flight or check the weather, developers often get a generic "I can't do that" or, worse, a hallucinated success. I built Failure Lab to change that.

The Problem: The "All-or-Nothing" Agent Fallacy

Most agent architectures treat tool calls as atomic and infallible. If one API fails, the whole chain breaks. Failure Lab demonstrates a more resilient approach: Graceful Degradation and Optimistic Synthesis.

Under the Hood: React Flow + Simulation Engine

1. The Real-Time Graph

We use @xyflow/react to map out the agent’s "nervous system". Each node is a custom React component that reacts to state changes in the simulationEngine.ts.

// Custom Node logic for Tool Status
const CustomNode = ({ data }) => {
  const status = data.status; // 'running' | 'success' | 'failed'
  return (
    <div className={cn("border", status === 'failed' && "border-red-500")}>
       {/* ... UI details */}
    </div>
  );
};

2. The Simulation Engine (Fault Injection)

To test reliability, we don't just call APIs. We route them through a local Express proxy (server.ts) that intentionally introduces:

Jitter: Varied latency to test UI responsiveness.
Auth Errors: 401 Unauthorized codes based on missing "Platform Keys".
Relational Failures: Simulating upstream service downtime.

3. Recovery and Synthesis

When a tool fails, the engine doesn't stop. It emits a recovery:success or hallucination:warning event. The final state is passed to a high-reasoning model (Gemini 1.5 Pro) with a specific system instruction:

"If a tool failed, acknowledge it and provide an 'optimistic hallucination' or a fallback recommendation based on general knowledge to maintain high user experience."

This ensures the user always gets a plan, even if the "Flight Search API" was down.

Why This Matters

For developers building production agents, observability is key. Failure Lab visualizes the "Matrix" of possible outcomes, helping teams understand exactly where their reliability budget is being spent.

Key Learnings

Zustand is perfect for High-Frequency Updates: Mapping thousands of trace events to a graph requires a light-footed state manager.
Markdown for Synthesis: Traditional JSON responses feel robotic. Using react-markdown to render the final synthesis makes the "Agent" feel more human and helpful.