Prisma vs JDBC: the benchmark that almost made me blame the wrong ORM

#english #experiments #typescript #performance

There's a discussion that surfaces every time someone posts an ORM benchmark: "of course JDBC is faster, you're measuring the abstraction". They're right, but only halfway. What nobody says is that the abstraction isn't the only culprit — sometimes the culprit is you, because you let an N+1 slip through without noticing.

I built prismavsjdbc to test this in a controlled way. It's not a benchmark about who wins. It's a lab where the same PostgreSQL 16, the same 50k-task dataset, and the same business scenarios run against two stacks: Node.js 24 LTS + TypeScript + Prisma 5 on one side, and Spring Boot 3 + Java 21 LTS + JdbcTemplate on the other. The analyzed commit is 2cd33e32bd29a1d4b46a26af0b56d6a912f5e4f5, tag best-effort-editorial-final.

The thesis I'm defending is this: query shape, SQL/request, and N+1 explain more than the slogan "ORM vs raw SQL". When you optimize the shape, both stacks improve. When you don't, both stacks charge you.

The problem that almost made me draw the wrong conclusion

The first version of the lab had an obvious trap, even though I didn't see it at first. It compared the most comfortable Prisma implementation — using include to fetch relations — against a manual join in JDBC. The result was predictable: JDBC measured 1 SQL/request, idiomatic Prisma measured 4 SQL/request on read-by-id, and latency reflected that.

Incorrect conclusion I almost published: "Prisma is slower because it emits more queries".

Correct conclusion: I was comparing different shapes. Prisma's include fires separate queries per relation — that's not a bug, it's the documented contract of the API. JDBC did a join because I wrote it that way. It's not fair to compare them without acknowledging that.

That friction changed the entire lab design: I needed three levels within each stack.

Three levels: naive, idiomatic, best-effort

Adding the level column to results/comparison.csv was the most important decision in the project. Without it, any results table is a trap for the reader.

naive: the most direct implementation possible, with no thought given to performance. In both stacks, this includes deliberate N+1 — per-task queries inside a loop.
idiomatic: the normal, maintainable way to write code in each stack. Prisma with include and _count, JDBC with the join any Java dev would write without obsessing over micro-optimizations.
best-effort: the tightest code the team would accept without it becoming a hack. For Prisma, this means dropping to $queryRaw when the shape is aggregational.

The read-by-id scenario with idiomatic Prisma measured 4 SQL/request due to include. The read-by-id-best-effort variant with $queryRaw dropped to 1 SQL/request — the same join JDBC uses. The PostgreSQL plan for that query is clean:

-- read-by-id-best-effort: same SQL in Prisma $queryRaw and in JdbcTemplate
select t.id, t.title, t.status, t.created_at as "createdAt",
       p.id as "projectId", p.name as "projectName",
       o.id as "organizationId", o.name as "organizationName",
       u.id as "assigneeId", u.display_name as "assigneeName"
from tasks t
join projects p on p.id = t.project_id
join organizations o on o.id = p.organization_id
join users u on u.id = t.assignee_id
where t.id = '00000000-0000-4000-0100-000000000001'::uuid
limit 1;
-- Execution Time: 0.242 ms, Buffers: shared hit=9

When Prisma and JDBC emit the same SQL, the PostgreSQL plan is identical. That closes the runtime debate: the bottleneck was the shape, not the client.

N+1 is the usual villain, but the lab shows it with numbers

The n-plus-one-trap scenario exists to make explicit something every developer knows in theory but underestimates in practice. The naive level in both stacks fires individual queries per task — on a 50k-task dataset with concurrency 16, that scales brutally.

The biggest jump in the lab wasn't between Prisma and JDBC. It was between naive and idiomatic within Prisma. When you go from N+1 to include/_count, the reduction in SQL/request is immediate and visible in latency. After that, if you want to squeeze more, $queryRaw gives you another jump — but smaller than the first.

The interesting part on the Java side is that CountingJdbc — the wrapper over JdbcTemplate in apps/jdbc-service/src/main/java/com/example/jdbclab/CountingJdbc.java — uses an AtomicLong to count queries. That allows an objective SQL/request comparison without relying on logs or pg_stat_statements as the primary source:

// CountingJdbc.java — instrumentation with no magic, easy to audit
@Component
public class CountingJdbc {
  private final JdbcTemplate jdbc;
  private final AtomicLong queryCount = new AtomicLong();

  public <T> List<T> query(String sql, RowMapper<T> mapper, Object... args) {
    // each call to the wrapper adds 1 to the counter
    queryCount.incrementAndGet();
    return jdbc.query(sql, mapper, args);
  }

  public long count() {
    return queryCount.get();
  }
}

On the Prisma side, the equivalent lives in apps/prisma-client/src/db.ts: it hooks into the client's query event to count. That symmetry in instrumentation is what makes the SQL/request numbers comparable across stacks.

When $queryRaw makes sense and when it's a surrender

This is the part where a lot of Prisma posts aren't honest. $queryRaw exists and is valid, but using it for everything is admitting you don't want to use Prisma — you're using PostgreSQL with a fancy TypeScript client.

The decision in the lab was clear: best-effort with $queryRaw makes sense in relation-summary and report-aggregation because the shape is genuinely aggregational. Prisma groupBy doesn't cleanly express date_trunc + join by organization, and forcing it would be worse than writing SQL.

By contrast, paginated-list has no best-effort variant because idiomatic Prisma already emits 1 SQL/request with findMany and filters. Adding $queryRaw there wouldn't change anything meaningful — it would be complexity with no benefit.

The table in docs/brief-post.md models this well: the level column isn't a scale of "how much effort you put in" but of "how much the SQL shape changes when you apply the variant".

What the lab can't guarantee

The HTTP runner is homegrown — not k6 or wrk. The hardware is local. Docker Desktop, GC, plan cache, and indexes can shift absolute latencies between runs. The editorial run used 3 runs, 300 requests per run, 30 warmup requests, concurrency 16, and a 50k-task dataset — but those numbers on different hardware can produce different results.

The version matrix (docs/java-version-matrix.md) shows Java 21 vs Java 25: there are differences, but the main argument — that N+1 and SQL/request dominate — holds on both JVMs. Java 25 improved read-by-id by ~20% over Java 21 in the local run, but that doesn't change the fact that the problem in relation-summary-naive was the shape, not the JVM.

I wouldn't publish those absolute numbers as universal truth. I publish them as evidence of a pattern: when you change the shape, the delta is orders of magnitude larger than when you change the runtime.

The position I landed on

Prisma is not slow. Prisma with include emitting 4 queries where you could emit 1 is an ergonomics trade-off with an observable cost — and that cost is worth it for most endpoints in an API that isn't under extreme pressure. When shape genuinely matters, $queryRaw exists and works well.

JDBC with JdbcTemplate is not superior just because it's raw SQL. It's predictable because the developer controls the shape from the start. The risk is on the other side: that nobody checks whether those Java loops are also doing N+1 without an ORM to blame.

The lab is reproducible. If you have Docker, Node 24 LTS, and Java 21 or 25, you can run it:

# full editorial run — Bash
bash scripts/run-lab.sh --mode editorial --size editorial --runs 3 --requests 300 --warmup 30 --concurrency 16

And if you just want to verify the scenarios run without errors before committing time:

# quick smoke test to validate the setup
bash scripts/run-lab.sh --mode smoke --size small

The code is at github.com/JuanTorchia/prismavsjdbc. Editorial results are in results/comparison.csv and results/comparison.md.

What I'd like to know: in the stack you're using right now, do you have real visibility into the SQL/request count for each endpoint? Or do you assume the ORM handles it on its own?

This article was originally published on juanchi.dev.