I've interviewed maybe 300 SRE candidates over the years. Most interviews are bad. Leetcode for people who don't write production code. System design questions ripped off from a blog.
Here are the questions I actually ask, and why.
'Walk me through your last bad incident'
The best SREs can tell you about incidents with specificity. They remember the graphs, the timeline, the moment they realized what was wrong, and what they'd do differently. Vague answers mean they weren't really there.
'Your on-call phone rings at 3 AM. Walk me through the first 10 minutes'
Reveals process discipline. Good answers: acknowledge → assess impact → communicate → start diagnosis. Bad answers: jump straight to SSH-ing into a box.
'Tell me about a time you killed an alert'
Not 'added an alert.' Killed one. Shows they understand alert hygiene as a discipline, not a chore.
'What's the difference between an SLI and an SLO?'
Tests fundamentals without being a trick question. You'd be surprised how many senior candidates conflate these.
'What's the most useful tool you've built for yourself?'
Good SREs are tool builders. The answer should be small, specific, and boring a script that saves 5 minutes a day, not a grand platform.
What I don't ask
'Reverse a linked list on a whiteboard.' SRE is not a coding contest. I care whether you can debug a broken system at 3 AM, not whether you memorized CLRS.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)