Chalom Ellezam

Posted on May 12

The Claude Code production checklist: 15 things that aren't obvious until they bite you

#ai #beginners #webdev #claudecode

Disclosure: I'm a senior backend tech lead and I run HostingGuru. This list applies to any platform; HostingGuru happens to handle a few of these for you automatically, which I'll flag honestly when relevant.

I've helped about a dozen non-technical founders take their first Claude Code MVP from localhost:3000 to a real production URL in the last six months. They are much better at shipping than the same founders would have been two years ago.

But the same 15 things keep biting them in the first two weeks after launch. Almost none of these are about the code being wrong. They are about production being a different environment with its own rules, rules nobody warned anyone about because everyone assumed you already knew them.

This is the checklist I now send to every founder before they go live. If you went live in the last 30 days and didn't go through it, you almost certainly have at least four of these issues right now.

1. Your `.env` file is in your git history

Even if your current code has .env in .gitignore, check git history:

git log --all --full-history -- .env
git log --all --full-history -- .env.local

If anything shows up, your API keys were exposed at some point. They are still exposed, because git history is forever. Rotate every key in that file. Today. Yes, even if the repo is private. Future you who makes the repo public for an open-source moment will thank present you.

2. You have one set of API keys for dev and prod

Same OPENAI_API_KEY, same STRIPE_SECRET_KEY, same SENDGRID_API_KEY in your laptop and on the production server. The first time you accidentally run a test script that fires 500 emails or charges 200 cards, you'll wish you had a *_DEV and a *_PROD. Make separate keys per environment. Today.

3. Your Stripe webhook is unsigned

When Stripe POSTs to /api/webhooks/stripe, you should verify the signature header before trusting the payload. If your code just reads req.body.amount and credits the user's account, anyone on the internet can hit that URL with fake events and give themselves credits.

The fix is three lines:

const sig = req.headers['stripe-signature'];
const event = stripe.webhooks.constructEvent(req.rawBody, sig, process.env.STRIPE_WEBHOOK_SECRET);
// Now use event.type, event.data — verified.

Required reading: Stripe's webhook signature docs.

4. You're using Stripe test keys in production

Your STRIPE_SECRET_KEY starts with sk_test_... instead of sk_live_.... Real payments hit Stripe's test environment, which... doesn't charge anybody. You launch, you celebrate the first sale, three days later you realize Stripe has $0 from you.

Same with STRIPE_PUBLISHABLE_KEY and the frontend pk_test_* vs pk_live_*. Match them. Double-check after every deploy in the first week.

5. Your database has no backup strategy

"I'll set up backups later" is a sentence I have heard about 100 times. Approximately zero of those people set up backups later. Then someone (often Claude) runs a migration that drops a table, and the conversation is over.

Most managed databases (Supabase, Neon, managed Postgres on any PaaS) have automatic daily backups built in but only if you turn it on. Click around your database dashboard now. If you don't see "Backups enabled," fix it before reading item 6.

For self-hosted: pg_dump to S3 / Cloudflare R2 nightly via a cron. Test the restore. Once. The 5 minutes you spend testing is the difference between "we recovered" and "we lost everything."

6. You have no rate limiting on AI endpoints

You have a /api/chat route that calls OpenAI. Someone (a scraper, a bored teen, your competitor) discovers it and hits it in a for loop. By the time you notice, your OpenAI bill is up by $400 and the abuser has stopped.

Even a stupid rate limit is much better than no rate limit:

// Crude but works
const ipHits = new Map();
app.post('/api/chat', (req, res) => {
  const ip = req.ip;
  const now = Date.now();
  const hits = ipHits.get(ip) || [];
  const recentHits = hits.filter(t => now - t < 60_000);
  if (recentHits.length >= 10) return res.status(429).send('slow down');
  recentHits.push(now);
  ipHits.set(ip, recentHits);
  // ... your real handler
});

10 calls/minute per IP. Most legitimate users won't hit it. Most abusers will. For real production, use a library (express-rate-limit, slowapi, etc.) with Redis-backed counters.

7. CORS is wide open

You have Access-Control-Allow-Origin: * in your headers or cors({ origin: '*' }) in your Express setup. For a public read-only API, fine. For anything with auth, this means any random website can make authenticated requests as your logged-in users.

Set origin to your specific frontend domain. If you need both https://yourapp.com and https://www.yourapp.com, use an allowlist:

const allowed = ['https://yourapp.com', 'https://www.yourapp.com'];
app.use(cors({ origin: (origin, cb) => cb(null, !origin || allowed.includes(origin)) }));

8. You haven't set up error tracking

Sentry takes 25 minutes to install. Until you have it:

Your users find bugs before you do
You spend hours guessing what broke from screenshots
You miss the bugs that don't generate user complaints (and there are a lot of those)

Install it tonight. Free tier covers 5K errors/month, plenty for any startup under 10K users.

npm install --save @sentry/nextjs   # or @sentry/node, @sentry/python, etc.
npx @sentry/wizard -i nextjs        # follows a guided setup

9. Your app ships source maps to production

Source maps make stack traces readable, but if they're served publicly, anyone can open Chrome DevTools and read your original (TypeScript / unminified) code. This includes your API logic, your prompts to OpenAI, your business rules.

For Next.js:

// next.config.js
module.exports = {
  productionBrowserSourceMaps: false,
};

Upload source maps to Sentry instead (so YOU can debug stack traces) and exclude them from the public bundle.

10. You have no `/healthz` endpoint

Most hosting platforms periodically ping a health endpoint to know if your app is alive. If you don't have one, the platform pings your homepage, which loads your full app stack including AI calls, which is slow and expensive.

Add one line:

app.get('/healthz', (_req, res) => res.status(200).send('ok'));

Configure your hosting platform's health check to point at /healthz. Cheap, fast, useful.

11. Your DNS TTL is too high

Most domain registrars default to a 24-hour or 4-hour TTL (Time To Live) on DNS records. This means when you change your domain to point at a new host, browsers and ISPs cache the old DNS for up to 24 hours.

Before any DNS migration, log into your registrar (Namecheap, OVH, Cloudflare) and set TTL to 300 seconds (5 minutes). Wait one TTL period. Then make your changes. Propagation will be minutes, not hours.

Do this before you need to migrate, not the day of.

12. Your runtime version isn't pinned

Claude Code generates code targeting whatever Node / Python / Ruby version it's currently aware of (often the latest). Your hosting platform might run an older default. Result: subtle bugs that work on your laptop and break in prod.

For Node, in your package.json:

{
  "engines": {
    "node": "20.x"
  }
}

For Python, create runtime.txt:

python-3.11.7

For Ruby, .ruby-version:

3.2.2

Pin once. Forget about it. Your future self never debugs "works on my laptop" again.

13. Server secrets are in your frontend bundle

In Next.js, environment variables prefixed with NEXT_PUBLIC_ are sent to the browser. If you accidentally name your server secret NEXT_PUBLIC_STRIPE_SECRET_KEY, you have just published it to every user's Chrome.

Rule:

NEXT_PUBLIC_* → safe for the browser (analytics IDs, Stripe publishable keys, feature flags)
Any actual secret → no prefix. Server-only.

Same idea in Vite (VITE_*), Create React App (REACT_APP_*), Astro, etc. Audit your .env for any "PUBLIC" variable that shouldn't be.

14. You have no monitoring on cron jobs

You set up a nightly job at 0 3 * * *. It worked for the first three days. It hasn't run in two weeks because of a node_modules issue you didn't notice. You only realize when you check the database and see no new data.

Two ways to know your crons are running:

Easy way: every cron pings a service like healthchecks.io (free for solo use). If the ping doesn't arrive within the expected window, it emails you.

0 3 * * * /opt/myapp/nightly.sh && curl -fsS -m 10 https://hc-ping.com/<uuid>

Harder way: hosted platforms with built-in cron observability. HostingGuru does this via its AI monitoring layer (more on that at the end). Render and Railway expose cron logs but you have to remember to look.

15. Your app crashes silently on startup, restarts forever

A crash on boot looks like this: your platform starts your container, your app throws an uncaught exception within 2 seconds, platform restarts, repeat. Externally, your domain just returns 503s.

If you don't have a process.on('uncaughtException') handler that logs to your error tracker AND alerts you on Telegram/Slack/email, this can go on for hours before you notice.

Minimum viable setup:

process.on('uncaughtException', (err) => {
  console.error('FATAL:', err);
  // Optionally: send to Sentry, post to Telegram, etc.
  process.exit(1); // let the platform restart cleanly
});

Then make sure your hosting platform's health check is configured (item 10), so when crashes happen it actually stops trying to restart endlessly.

The faster path through this list

Some of these are handled for you on managed PaaS platforms:

HostingGuru (full disclosure: I build it): encrypted env vars (so item 13's blast radius is smaller if you misname), AI-monitored crash loops (so item 15 pings you on Telegram automatically), built-in cron job monitoring with the same Telegram alert (item 14). The list of what you still need to do yourself remains long though: items 1–4 (key hygiene), 5 (backups), 6–7 (rate limiting + CORS), 9 (source maps), 11 (DNS), 12 (runtime pinning). The platform can't fix code-level decisions.
Render / Railway / Fly.io: similar pattern. Some items (env vars in dashboard, basic process restart logic) are handled. The code-level items remain yours.
VPS + Coolify or Dokku: you handle all 15 yourself. That's fine if you have the time and discipline. Most solo founders don't.

The right move is: pick a platform that handles a few of these so you can focus on the rest, then actually go through the rest.

What to do tonight, in order

Run git log -- .env (item 1) — 30 seconds. If anything appears, rotate keys immediately.
Check your Stripe key prefix (item 4) — 10 seconds. echo $STRIPE_SECRET_KEY | head -c 10. Should be sk_live_... in prod.
Add /healthz endpoint (item 10) — 2 minutes.
Add an uncaughtException handler (item 15) — 5 minutes.
Verify your database has backups enabled (item 5) — 2 minutes in your DB dashboard.

That's 10 minutes for the five most critical items. The other 10 you can do over the week.

If you build with Claude Code, this list is also a useful prompt: "Audit my repo against the following 15 items..." Claude will go through each one and tell you which apply, which don't, and give you a fix for the ones that do. It catches most of them. The audit takes maybe 15 minutes total.

What's the most embarrassing thing you've shipped to production that this list would have caught? I'm collecting horror stories for v2.

Previous posts in this series:

1. Heroku just went into "sustaining engineering mode." Here are 5 alternatives whose free tier actually doesn't sleep

2. I built my MVP with Claude Code. Now I need to deploy it. Here's what nobody tells you.

3. Your AI app is silently burning $2,000/month and you don't know it.

4. Telegram alerts for any production app — a 5-minute setup.

5. How I built a Discord 'ship-tracker' bot in a weekend.

6. I migrated 12 client projects off Heroku. Here's the playbook.