Everyone is building AI agents.
Almost nobody can audit them.
You give an LLM a prompt.
It calls tools.
Generates outputs.
Changes workflows.
Makes decisions.
And two days later?
Nobody knows:
- what prompt was used
- which agent triggered it
- what actually changed
- which version of the workflow produced the result
- why the output suddenly shifted
That felt insane to me.
So I built:
AI Audit Shelf
A lightweight, open‑source system that brings Git‑like versioning to AI workflows.
Every AI action becomes an immutable chapter.
Chapters bundle into versioned books.
Books live on a shelf grouped by feature.
Library
└── [Feature: HR Automation]
├── b_001 v1 Employee Onboarding
└── b_002 v2 Employee Onboarding
↑ full audit trail preserved
Why I Built This
Right now, most AI workflows are:
Prompt in → Magic happens → Output out
That’s fine—until you need:
- compliance
- debugging
- observability
- reproducibility
- enterprise‑grade audit logs
- workflow history
- team collaboration
Traditional software has:
- Git
- commits
- diffs
- version history
AI workflows have…
screenshots, Slack messages, and vibes.
What It Does
1. Immutable AI Audit Logs
Every action is stored as an immutable record:
- prompt
- result
- actor
- timestamp
- source
Example:
python cli.py add-chapter \
"Analyze customer churn" \
"Churn decreased by 3%" \
--actor analytics-agent
Now every AI decision is traceable—not “I think that happened.”
2. Git‑Like Workflow Versioning
Update workflows without losing history:
python cli.py new-edition b_001 \
--chapter-ids c_001 c_002 c_003
Old versions stay forever.
You can roll back, compare, or inspect anytime.
3. Workflow Diffs
Compare workflow versions like Git commits:
python cli.py diff-books b_001 b_002
You instantly see:
- what steps were added
- what steps were removed
- which actions stayed the same
No more “I don’t know what changed.”
4. Human + Machine‑Friendly Exports
Export workflows as:
- Markdown
- JSON
Perfect for:
- auditors checking compliance
- compliance teams drafting reports
- engineers debugging regressions
- product managers documenting behavior
5. Built‑In Dashboard (Zero Overhead)
No React.
No build tools.
No dependencies.
Just:
open dashboard.html
And browse:
- shelves
- books
- chapters
- diffs
- searches
Lightweight, fast, and ready to run anywhere.
The Architecture
The whole system is intentionally simple:
FastAPI + SQLite + Vanilla JS
That’s it.
No Kubernetes.
No vector DB.
No 500‑MB framework.
I wanted:
- local‑first development
- hackable internals
- understandable code
- zero vendor lock‑in
You can read the whole stack in an afternoon.
Example: Auditing an AI Support Agent
An AI support agent might:
- Fetch the customer ticket
- Search internal docs
- Generate a response
- Send an email
Normally:
- impossible to trace what went wrong
- no way to replay or compare runs
With AI Audit Shelf:
- every step becomes a chapter
- the entire workflow becomes a versioned book
Now you can:
- replay past workflows
- audit outputs line‑by‑line
- compare versions (v1 vs v2)
- debug regressions instantly
Integrations Included
I’ve shipped example integrations for:
- OpenAI
- LangChain
- shell scripts
- generic Python apps
You can plug AI Audit Shelf into existing workflows today, not in six months.
One Thing I Realized While Building This
AI tooling is repeating the early software era.
Right now, most AI systems are:
- opaque
- unversioned
- non‑reproducible
Eventually:
- observability
- auditability
- workflow versioning
…will become standard infrastructure—just like Git.
I strongly believe “Git for AI workflows” is a real category, and it’s coming fast.
Open Source
Check out the repo and try it yourself:
👉 https://github.com/ATHARVA262005/ai-audit-shelf
I’d love your feedback:
- feature ideas
- architecture suggestions
- new integrations
- brutal criticism
If you think AI workflows should be auditable, versioned, and reproducible,
star the repo and help turn this into a new standard.
Top comments (0)