DEV Community

Cover image for The Hidden Cost of Flaky Tests
Samson Tanimawo
Samson Tanimawo

Posted on

The Hidden Cost of Flaky Tests

Flaky tests feel like a QA problem. They're actually a reliability problem.

The direct cost

If your CI has a 10% flaky failure rate and you deploy 20 times a day, that's 2 rerun-required failures per day. Each rerun is 15 minutes. You just burned 30 engineering minutes per day to flakiness.

The real cost

Engineers learn that failures don't mean anything. 'Oh, it's just flaky, hit rerun.' That learned behavior kills your entire test suite as a signal.

The next time a real bug breaks the build, the first reaction is still 'just hit rerun.' Three reruns later, the bug lands in prod.

The fix

  1. Quarantine flaky tests. Move them to a separate suite that runs but doesn't block merges. Now your main suite is trusted again.
  2. Track flakiness as a metric. Per-test failure rate. Tag the owner.
  3. Give flaky tests a TTL. 30 days to fix or delete. No exceptions. If the team that owns it can't fix it, the test isn't needed.
  4. Reward fixing flaky tests. Give public credit. Most teams don't, and the work goes invisible.

The uncomfortable truth

Most flaky tests are revealing real concurrency bugs in the code. You just aren't seeing them in prod yet. A 'flaky' test that fails 5% of the time in CI often corresponds to an edge case that bites users at roughly the same rate.

Take them seriously. They're trying to tell you something.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)