Probe Runner

Posted on May 11

Green E2E Tests Don't Mean Your API Contract Stayed the Same

#testing #webdev #api #automation

I have been thinking about a small gap in how we talk about regression testing:

A UI test can pass while the API contract behind that UI quietly changes.

Recently I tried a small experiment around UI-driven API regression checks. The idea is simple:

Run the same UI scenario against two backend versions.
Record the API traffic produced by the browser.
Compare the JSON responses.

This is not formal contract testing. It is closer to asking: "For this real user flow, did the wire behavior change between versions?"

The Upgrade Looked Safe

I tried this on a Medusa upgrade from v2.13.6 to v2.14.0.

It looked like a normal minor version bump:

UI tests were green
Integration tests were green
Nothing obvious stood out in the changelog

But the recorded API traffic showed a response shape change.

GET /admin/orders/{id}/preview started returning an email field in v2.14.0 that was not present in v2.13.6.

I traced it back to the Medusa source. The previewOrderChange method's select array gained one entry.

v2.13.6

const order = await this.retrieveOrder(orderId, {
  select: ["id", "version", "items.detail", "summary", "total"],
  relations: ["transactions", "credit_lines"],
}, sharedContext)

v2.14.0

const order = await this.retrieveOrder(orderId, {
  select: ["id", "version", "items.detail", "summary", "total", "email"],
  relations: ["transactions", "credit_lines"],
}, sharedContext)

One token changed.

The field already existed on the order entity. What changed was whether this endpoint hydrated and returned it.

That change was not mentioned in the release notes, changelog, or migration guide as far as I could tell.

Why The Existing Tests Missed It

In this stack, the usual tests did not notice the change.

The UI did not display email on that page, so the UI test had no reason to fail.

Some lower-level tests mocked the API, so they were not observing the real wire response.

The integration tests asserted that expected fields existed and had correct values. They did not assert that no additional fields existed.

That last point is important. Most API tests are written like this:

status is 200
response has id
response has total
response has items

Much fewer tests say:

response has exactly this schema
no unexpected field was added
this field did not change nullability
this field did not change type

And that is usually reasonable. Strict schema checks everywhere can become noisy and expensive to maintain.

But it also means that "all tests passed" does not necessarily mean "the API contract stayed the same."

Is An Added Field A Breaking Change?

Sometimes no.

For many JSON consumers, adding a field is harmless. They ignore unknown properties and move on.

But not every consumer behaves that way.

Generated clients, strict decoders, mobile apps, partner integrations, analytics jobs, and internal services may reject unknown fields or depend on a closed schema. In those systems, even an additive response change can become a real regression.

The problem is not that every response shape change is bad.

The problem is that these changes can be invisible.

The Question I Am Trying To Answer

I am less interested in whether this specific technique is the "right" tool.

Contract tests, schema validation, OpenAPI diffing, consumer-driven contracts, snapshot tests, and traffic diffing can all be part of the answer.

The bigger question is operational:

Who is expected to notice silent API contract drift?

Is it the backend team?

The QA or test automation team?

The platform team?

The owners of consumer-driven contract tests?

The downstream consumers, after something breaks?

Or is it usually nobody's explicit responsibility?

My current view is that "the UI still works" and "the API contract did not change" are different claims. A green E2E suite can prove the first without proving the second.

I am curious how other teams handle this.

Do you actively regression-check API response shape across upgrades, or do you only find out when a consumer breaks?