I joined a project with an existing Playwright E2E test suite, 38 spec files, ~165 tests, around 14,000 lines of test infrastructure. My first step...
For further actions, you may consider blocking this person and/or reporting abuse
This hits hard — I’ve seen “green CI, broken locally” way too often. The tracer bullet approach + fixtures cleanup is a solid fix… and using AI for analysis (not blind coding) is the real takeaway here.
Hey Debbie, thanks for sharing this track! It's always interesting to read the process, not just the final result.
You might also find it useful to visualize the original and updated test runs to reveal how projects and fixtures are arranged on the timeline. I’ll leave a link to my project
here.
Great article. That's pretty much I do as well, plan then approve and then fix, ask AI followup questions. Human in the loop is critical.
One question, any reason u didn't use playwright agents here?? I usually run it by healer agent and once we agre and changes are made, ask Claude to review the changes as well. So llm as a judge kinda checkpoint.
healer agent is great for when i know i have broken tests but some of these tests were also not broken as such but more architectural problems
This is a really thoughtful breakdown of using AI as a thinking partner rather than just a code generator. The tracer bullet approach resonates deeply — it's essentially using AI to build a mental model first, then letting that model guide the refactor.
What strikes me is the 18 analysis documents phase. Most people skip straight to "fix it," but you used AI to map the unknown territory before moving. That's the difference between AI-assisted debugging and AI-assisted understanding.
One thing I've noticed in similar situations: the quality of AI analysis scales with how well you can articulate what you don't know. Your "asking questions I didn't know the answers to" framing is exactly that — it's a skill worth naming explicitly.
Did the AI catch any anti-patterns that surprised you, things you wouldn't have flagged yourself even with full domain knowledge?
some small things but nothing major to be honest
The E2E layer is where AI pulls its weight better than anywhere else in the stack — flaky tests are loud, instantly visible, and self-correcting. The compounded danger we hit: AI is good at writing tests that pass, bad at writing tests that mean something. We started running every AI-generated test through a mutation check — break the function under test in 3 trivial ways and see if any of the new tests catch it. Roughly half didn't. Now mutation-survival is part of the merge gate for any AI-suggested test. Speed gain is still real; just had to add a rung to the ladder.
Your methodical AI-driven root cause analysis for a sprawling test suite is impressive. Gaining domain knowledge quickly, even with AI, is a common challenge in complex systems.\n\nThis parallels our work in health, where AI could analyze traditional knowledge like 'desi ilaaj' (local remedies). Yet, most US/EU health AI platforms are structurally constrained from engaging with such culturally embedded systems. That cultural depth is a real architectural moat.\n\nBuilding AI that truly adapts to diverse, nuanced contexts is what we're focused on (I'm building GoDavaii).
ran into a 4% local pass rate with green CI once - nobody had run the suite locally in months, everyone trusted the pipeline. workers:1 hiding in CI config is nasty.
this is one of the few AI + testing posts that doesn’t feel like “just prompt harder”
the tracer bullet + skill combo is doing most of the work here. you basically turned a messy system into something deterministic enough for AI to operate on
feels like that’s the actual pattern people are missing
how does it behave long term though? like once new tests start getting added by different people, does it hold or start drifting again?
only time will tell but as things keep changing all the time it might be worth running continuous analysis or having some sort of skill, for sure something worth thinking about
Good App!