This morning, our Design Lead asked me how to get her terminal into the right folder.
Tanya [6:57 AM]
my ghost isnt in tiger den anymore! GAH do i do /tiger-den
or what was the command to make it work
in tiger den
Matty [7:14 AM]
What what?
Tanya [7:14 AM]
sorry lol
you know when i go into terminal/or ghostie
and i need to type the thing to make it so im
working in tiger den
Matty [7:14 AM]
Yes type cd tiger-den
Tanya [7:15 AM]
YES
OK CD
thank you
Tanya has been here at Tiger Data for about two weeks. In that time, she shipped a feature to our production Next.js app.
I'm not going to pretend that's a normal sentence to write.
Tanya is our Design Lead. She runs brand. She's the reason our designs look the way they do and our blog thumbnails don't look like a high school yearbook. She is not a software engineer. She had not, as far as I know, previously planned on becoming one.
She didn't become one. She just started shipping.
The feature she shipped is an internal brand hub: a searchable index of every Tiger Data brand asset (logos, typography, colors, whatever you need) plus an interactive image builder that works like Canva (if Canva had been written specifically for our team). You can upload your own images, pick from our logo library, layer in backgrounds and icons, and drop text on top. Drag everything around, resize it, line it up. Export a PNG. (The thumbnail on this post was made with it.)
The downside of teaching a designer to use the terminal is that she will want hers to look like yours. Tanya saw my Ghostty theme and my catppuccin Starship theme over a screen share and decided she wanted both. Her Claude Code statusline came next. That's an entire other post.
Hold both of these facts in your brain at the same time: Tanya is asking me what cd does this morning. Tanya shipped a real feature to a real codebase last week.
If you read my last post on coding agents and internal tooling, you know where I ended it. The caveat was that I was a solo contributor to a real codebase, and the first comment on the post called out exactly the right thing. Who owns this when you leave? That's a reasonable question. In March, my answer was "me, and that's a problem."
My answer in April is "...and also Tanya."
That's only true because of what got built alongside the agent. The agent is the interesting part of the demo. The boring part is what made it safe to hand a Next.js codebase to a designer who's still learning cd.
Here's that boring part.
The agent is not the hard part
Coding agents are very good at a lot of things. Being trustworthy is...not one of them.
I don't mean malicious. I mean if you ask an agent to fix a bug, it will often fix the bug. It will also sometimes decide the failing test is "probably flaky" and move on. Or notice a test it doesn't understand and quietly skip it. Or hit an error, try a second approach, fail, try a third approach, fail again, and by the time you look up it has deleted the function you were trying to fix. Or my favorite one: "These test failures are unrelated to my changes".
This is catchable. I catch a lot of it. I've been doing ops since before the Seinfeld finale aired, and I have been running DevOpsDays since before most of the current DevOps job descriptions existed. I read every diff before it leaves my machine and I have strong opinions about what a normal-looking CI run is supposed to contain.
I still miss things, everyone does. I rubber-stamp, I push tired, I get three levels deep on a task and stop reading carefully. Even with the paranoia cranked up, I am not reliably the last line of defense for my own work, let alone anyone else's.
And the paranoia itself is not transferable. "Notice when an agent silently dropped a test case" is not a skill you pick up in two weeks. It is not a skill most of the engineers I know pick up in two weeks, because most of them never had to. I cannot hand it to Tanya by writing a memo.
So the question is not how do I make Tanya as paranoid as I am. The question is: how do I make the system so paranoid that it doesn't matter how paranoid any of us are on a given Friday afternoon?
You build the paranoia into the system. (If this sounds familiar, it should.)
Rules are a suggestion
The first place everyone reaches is CLAUDE.md. Some people call theirs AGENTS.md if they want the file to be portable across agents that aren't Claude. Same idea. You write down the rules you want the agent to follow, drop it at the root of the repo, and the agent reads it every session.
Ours has a section called "Golden Rules". A handful of the actual rules from the actual file:
Diagnose before fixing. Read errors, trace code, explain your hypothesis before writing any fix. No guess-and-deploy.
Never dismiss failures. Test failures, CI failures, lint warnings. Investigate every one. Never call them "pre-existing" or "not my problem."
Never destructive without permission. No DROP, TRUNCATE, DELETE without WHERE.
Always use
/prto push. Never rawgit push.
These are not theoretical. Each one is on the list because the agent did the thing I didn't want it to do, often enough that I eventually wrote it down. (Sometimes from running /insights in Claude Code.)
Does the agent follow them? Mostly. Does the agent ignore them when convenient? Also yes.
The failure mode is an exchange you have probably already had:
Me: "Wait, you just pushed to main."
Claude: "Oh shoot, you're right. The
CLAUDE.mdsays to use/pr. I just... didn't."
It apologizes. It means it. It will still do it again on Tuesday.
CLAUDE.md is training; it shapes the default behavior. It does not enforce anything. Treat it like training and it is useful. Treat it like a wall and something is going to walk through the wall.
That was the first thing I had to learn. It might have take a few times.
Skills are a habit
If rules are what you write down when you want an agent to know something, skills are what you write down when you want the agent (or the human) to do something the same way every time.
A Claude Code skill is roughly: a folder with instructions, some scripts, and a slash command. Type /setup and it walks through our dev environment setup. Type /debug and it starts a structured investigation instead of letting the agent flail. Type /release and it runs the release process in the right order.
The one that matters most is /pr.
Our repo has a preflight checklist you have to pass to open a pull request. It runs lint. It runs the typechecker. It runs the tests. It runs the build. It runs a code review subagent. If any changed files touch rendered output, it runs a UI verification step. If any touched docs, it runs a docs-drift check. If any touched security-sensitive paths, it runs a security review. Only then does it actually open the PR, and even then it opens it as a draft.
All of that is one command. Tanya does not have to remember any of it. She types /pr.
This is still not enforcement. Skills are suggestions. A motivated human (or a tired agent, or a confused agent, or an agent being nudged by a tired human) can skip /pr and push directly, and nothing in the skill itself will stop them (note: the skill does try to tell the agent when it should try to run it, and it usually gets it right...but not always).
The job of a skill is to make the right way the easy way. /pr is one command. Running all of those checks by hand is eight. If typing /pr is easier than doing the alternative, the alternative stops happening. That is most of the game.
Tanya used /pr for the brand hub. She did not run lint, typecheck, tests, and build in sequence. She would not have known to! She typed a slash and a word. Everything ran.
The rest is between her and the agent.
Hooks are a wall
The rule "never push to main" lives in CLAUDE.md. The skill /pr is easier to type than the alternative. Neither of those is enforcement.
The enforcement is a pre-push git hook.
If you (or a coding agent) run git push from a machine with our repo checked out, the hook fires before the push leaves your laptop. If the branch you are on is main or production, the hook exits 1 and prints the error message I wrote specifically for the version of me that is about to do a dumb thing. The push does not happen.
If the branch is anything else, the hook runs lint, typecheck, tests, and build locally. If any of them fail, the push does not happen.
There is a flag to skip hooks, yes. If the agent tries to use it, a separate PreToolUse hook catches the command before it runs and blocks it on protected branches. And if a push somehow lands on the remote anyway, GitHub branch protection rejects it on the server side.
I did not build all three layers on day one. I built them in the order I needed them, which is to say I built each one the first time something got past the layer before it.
There is also a SessionStart hook that runs before the agent does anything. It checks that the Node version is the one the app expects, that the environment file exists, that dependencies are up to date, that the database the session is about to touch is the development database and not production. If any of that is wrong, the session opens with a warning, and the agent is told to act on the warning before doing anything else.
None of this requires the agent to cooperate. None of it requires the human to remember.
CI is the outer wall. Six jobs, parallel, on every pull request. Lint. Typecheck. Frontend tests. Backend tests across three shards. Build. Doc drift check. If any job fails, the PR cannot merge. If the PR does not have a label, a seventh job fails. Zero warnings allowed.
This is the only layer that does not care how tired anyone is, how confused the agent is, or what quiet thing got dropped on the way through. It just says no.
Tanya has never pushed to main. She could not push to main if she tried. Neither can I. We have confirmed.
The scaffolding is the product
Here is the funny thing about all of this: none of it is new.
Linters. CI. Pre-commit hooks. Code review. Required labels. Branch protection. Engineering teams have been writing these guardrails for the humans on the team for two decades. What changed is that the same guardrails now work on the agent. And because the agent does most of the typing, they work for the non-engineer driving the agent too.
The three layers are just discipline given shape. Rules tell the agent what you want, skills make the right thing the easiest thing, hooks enforce the non-negotiables. That stack is not new. It is what engineering has been doing for twenty years, and it works.
What did change is the cost. A CLAUDE.md is an afternoon. A skill is another afternoon. A git hook is an hour. A SessionStart script is an hour. Most of them I wrote in a single week, when the alternative was cleaning up after a mistake the agent had already made twice.
The agent isn't the product, the scaffolding is. The agent is really good at working inside it without stepping on anything important.
There is one piece of scaffolding I have not been able to write.
It is the call on my calendar every few days where Tanya and I hop on a screen share and work through something hand-in-hand. We have a name for these. It is "make Tanya an engineer." Sometimes we are making her Claude Code statusline look cool. Sometimes we are picking out a new text editor and making it hers. Sometimes we are walking through how the app is put together so we can figure out how to make it even more awesome.
They are still happening. They will probably still be happening in six months. I am not trying to automate them away.
There is something about a real person on the other side of a screen share that makes the work feel survivable. A CLAUDE.md will not do that. A slash command will not do that. The hooks definitely will not do that.
That is also part of the system.
Two things I owe you
First, a callback to Guilherme on the last post. You were right that Retool is bus-factor insurance. You can also roll your own. It's called a CLAUDE.md, a pre-push hook that exits 1, and an hour-long call called "make Tanya an engineer."
Second, a correction on the opener. Tanya is not the only other person shipping code to Tiger Den. A few other folks have been committing too. I made her the hero because her two-week arc was the cleanest story. Apologies to the rest of the team for the dramatic license.
Airtable, it's been real. Tiger Den is better. Tanya is proof.
This post was reviewed and approved by Tanya before publishing. Zero AI agent skills involved.
Top comments (15)
The framing "the agent isn't the hard part, the scaffolding around it is" is exactly what most AI-in-DevOps writeups miss. We let a non-engineer audit env configs across our payments services last quarter — the agent was useless without the guardrails: a typed schema it had to output against, a diff preview a human approved, and a rollback mutex it literally could not skip. Scaffolding took a week; the agent prompts took an afternoon. The teams that quietly outship in 2026 will be the ones who got this order right, not the ones chasing raw model capability.
While I like the git and CI setup, I'm wondering where AI helps?
The
Don't do git pushin Claude.md is stopped by the hook. So why keep it in Claude.md?/prcould be a make command or whatever script runner you prefer. You can integrate an LLM call in the command to generate the message.From what I read between the lines is that you are pushing/leading people to the command line/terminal. If people want to use Claude why not let them use the desktop app?
With the desktop app one of the session setup parts is the directory. So no need for
cd.And then the
Don't do git pushline makes sense, otherwise the Claude app is burning tokens getting the same git hook message every time.Give visual people a GUI. We are used to seeing text all the time, but it is because we are weird (in a good way).
Yeah I think I didn’t write that part of the post very clearly; that’s exactly correct and it shouldn’t be there. It started there for me/us and then it became apparent that didn’t work.
And the
CLAUDE.mddoesn’t say “don’t push to main” it says “don’t push to origin directly/try to make up your own way of doing PRs, use the skill”. And guess what, it still sometimes doesn’t use the skill. But it’s not the end of the world if it doesn’t use the skill; it just isn’t as good. Many of the things the skill does (from a safety perspective) will be done by the CI as well; but having the agent do them before the push saves having the CI then fail it, and wasting the time.The critical safety thing that cannot be done differently is the pushing to the wrong branch, that's why that one is a hook and I think what I was trying to illustrate by the golden rules was almost a little bit of irony that it's called a golden rule and that's almost the most direct way to refer to them in something like a Claude.md. Even then they can get bypassed so that's why things are layered.
I think there's something in between the lines getting lost here, I'm not saying that the AI being there is helping the code that Tanya writes herself be checked, this is to make sure that the process that the AI uses that she (and myself, and others) is asking it to build with is following the rules.
I would never sit there and say you should layer an AI on to make sure that code that you are writing yourself is deployed correctly. We were doing that before there was AI. This is how we have the agents themselves follow rules. Yes many of those rules don't require something built into the agent but the PR flow is actually not as simple as something an
makecommand could do. It may be becoming a little oversimplified in my ability to write this into a blog post but it includes loops. It includes certain decision trees. It's not just simply a bash script... nothing wrong with bash scripts by the way.Also there's a little bit of a joke in here about the terminal but I am not the one that would ever tell anybody you should just do everything at the command line. This works exactly the same way if you're using Claude code on the desktop. Also even if you're using Claude code on the desktop there's at least one part where you have to use the terminal, which is to get the repo on your machine. There is no way in Claude code on the desktop, as an example, to say go get this from this GitHub repo on the web. Although if you are using Claude code in the web UI, it does have that, which has a nice advantage but that's a different story.
Anyway yes I know how to build all of those things without using AI and this was not about saying how to use AI to do the things we've already been doing with these other tools. This is saying if you are using AI as your coding agent, here's how you can embed some of these things.
Also if it wasn't clear in my comment that I first posted, I appreciate you taking the time to read this and comment on it and help me see things that I wasn't framing clearly in my post! I appreciate it
I think the emphasis on Claude.md and skills feels to me like you are downplaying your work.
It takes up a large part of the post, while it is only one of the situations that benefit from the great setup.
When you are opening with "person that doesn't know cd" and mentioning Claude code and Claude.md a few times, I'm thinking terminal, so that can be chalked up to my perceptive.
Make can run anything, it is not only bash scripts.
The thing I wanted to make clear was that the PR procedure doesn't need to burn tokens. Use AI where it makes the process easier.
I'm commenting because your post has valuable information. It just felt a bit off for me, and I wanted to address that.
I love it! Also after I read your comment it made me go back through our PR skill and look for ways to make it more token efficient so thanks for that! You are saving my usage rates !
Like everything else in this business of ours, there is no one panacea for everything and it isn't all solved with one tool but you know that as well as I do :)
Also to be fair the whole thing about not knowing the
cdcommand is a little bit click-baity and a little bit for fun, mostly because of that exact conversation we had the morning that I was starting to write this post. Tanya and I both thought it was hilarious and I joked that that is what I was going to lead with. She said, "No you absolutely should do that. I love it so much."Yeah I spotted that as soon I started reading. There is nothing wrong trying to attract readers.
The thing that lands for me here isn't the three-layer model itself—rules, skills, hooks is a clean taxonomy but not a surprising one. What's more interesting is the recognition that the scaffolding is the product, not the agent.
I've been circling a similar thought lately: we spend a lot of energy evaluating coding agents by what they can produce, but almost none evaluating the quality of the guardrails around them. A junior developer with a great CI pipeline and a clear code review process will out-ship a genius working alone with no safety net. The same seems true for agents, but the industry conversation is still mostly "which model scores highest on SWE-bench" rather than "which team has the best pre-push hook setup."
The part I keep coming back to is the SessionStart hook checking that the database is development and not production before the agent even gets to think. That's the kind of thing that sounds paranoid until you've been woken up at 2 AM once, and then it sounds obvious. How many of those checks did you add before something bad happened versus after? I'm trying to get better at writing the "before" ones, but most of mine still come from scars.
“ How many of those checks did you add before something bad happened versus after?”
That’s a great point! I’d say it’s a mix. Although most of mine come from scars too, but they are decades of scars.
There’s something helpful about having a pessimistic and paranoid sysadmin building / contributing to this stuff that helps :)
This is really a great insight and it’s spot on…and sadly unsurprisingly true
This is one of the best framings of the "AI + non-engineers" conversation I've seen. The key insight — "make the system paranoid instead of making the person paranoid" — applies way beyond just coding agents. I've been exploring a similar pattern with AI-powered productivity tools: instead of relying on the user to prompt perfectly, you build the quality checks into the pipeline itself. The same principle works for AI-assisted outreach and automation — the guardrails matter more than the prompts.
Love this proof that curiosity, persistence, and practical problem-solving can matter more than perfect command-line knowledge.
the best part is she shipped something. most people who know cd backward and forward are still waiting for perfect conditions.