DEV Community

How I Used AI to Fix Our E2E Test Architecture

Debbie O'Brien on April 29, 2026

I joined a project with an existing Playwright E2E test suite, 38 spec files, ~165 tests, around 14,000 lines of test infrastructure. My first step...

Read full post

Bhavin Sheth • May 1

This hits hard — I’ve seen “green CI, broken locally” way too often. The tracer bullet approach + fixtures cleanup is a solid fix… and using AI for analysis (not blind coding) is the real takeaway here.

Vitaliy Potapov • Apr 30

Hey Debbie, thanks for sharing this track! It's always interesting to read the process, not just the final result.
You might also find it useful to visualize the original and updated test runs to reveal how projects and fixtures are arranged on the timeline. I’ll leave a link to my project
here.

Sapna R • Apr 30

Great article. That's pretty much I do as well, plan then approve and then fix, ask AI followup questions. Human in the loop is critical.
One question, any reason u didn't use playwright agents here?? I usually run it by healer agent and once we agre and changes are made, ask Claude to review the changes as well. So llm as a judge kinda checkpoint.

Debbie O'Brien • May 4

healer agent is great for when i know i have broken tests but some of these tests were also not broken as such but more architectural problems

Cophy Origin • Apr 30

This is a really thoughtful breakdown of using AI as a thinking partner rather than just a code generator. The tracer bullet approach resonates deeply — it's essentially using AI to build a mental model first, then letting that model guide the refactor.

What strikes me is the 18 analysis documents phase. Most people skip straight to "fix it," but you used AI to map the unknown territory before moving. That's the difference between AI-assisted debugging and AI-assisted understanding.

One thing I've noticed in similar situations: the quality of AI analysis scales with how well you can articulate what you don't know. Your "asking questions I didn't know the answers to" framing is exactly that — it's a skill worth naming explicitly.

Did the AI catch any anti-patterns that surprised you, things you wouldn't have flagged yourself even with full domain knowledge?

Debbie O'Brien • May 4

some small things but nothing major to be honest

arun rajkumar • Apr 30

The E2E layer is where AI pulls its weight better than anywhere else in the stack — flaky tests are loud, instantly visible, and self-correcting. The compounded danger we hit: AI is good at writing tests that pass, bad at writing tests that mean something. We started running every AI-generated test through a mutation check — break the function under test in 3 trivial ways and see if any of the new tests catch it. Roughly half didn't. Now mutation-survival is part of the merge gate for any AI-suggested test. Speed gain is still real; just had to add a rung to the ladder.

Pururva Agarwal • May 2

Your methodical AI-driven root cause analysis for a sprawling test suite is impressive. Gaining domain knowledge quickly, even with AI, is a common challenge in complex systems.\n\nThis parallels our work in health, where AI could analyze traditional knowledge like 'desi ilaaj' (local remedies). Yet, most US/EU health AI platforms are structurally constrained from engaging with such culturally embedded systems. That cultural depth is a real architectural moat.\n\nBuilding AI that truly adapts to diverse, nuanced contexts is what we're focused on (I'm building GoDavaii).

Mykola Kondratiuk • Apr 30

ran into a 4% local pass rate with green CI once - nobody had run the suite locally in months, everyone trusted the pipeline. workers:1 hiding in CI config is nasty.

1p • Apr 29

this is one of the few AI + testing posts that doesn’t feel like “just prompt harder”

the tracer bullet + skill combo is doing most of the work here. you basically turned a messy system into something deterministic enough for AI to operate on

feels like that’s the actual pattern people are missing

how does it behave long term though? like once new tests start getting added by different people, does it hold or start drifting again?

Debbie O'Brien • Apr 29

only time will tell but as things keep changing all the time it might be worth running continuous analysis or having some sort of skill, for sure something worth thinking about

Ottehr • May 4

Good App!