DEV Community

Cover image for AI made devs feel 20% faster but measured 19% slower. Nobody's ready for that conversation.
Aditya Agarwal
Aditya Agarwal

Posted on

AI made devs feel 20% faster but measured 19% slower. Nobody's ready for that conversation.

Hidden review and debugging overhead

AI tools made developers feel 20% faster. Then researchers measured them at 19% slower. Read that again.

That's not a rounding error. That's a 39-percentage-point gap between perception and reality. And nobody in this industry wants to talk about it.

The Study That Should Make You Uncomfortable

The METR study used a randomized controlled trial. Not a survey. Not a blog post. Not a vendor-sponsored benchmark. A proper experiment.

They took experienced open-source contributors and had them work on their own repos. Their own code. Projects they already knew inside and out.

Half the tasks were done with AI coding assistants. Half without. The developers predicted AI would make them 24% faster before the study, and still believed AI had made them about 20% faster afterward. The stopwatch said they were 19% slower.

Vibes Are Not a Metric

Here's what bothers me. If you asked most developers right now whether AI tools make them more productive, they'd say yes. I'd probably say yes too.

That's the trap. The tools feel productive. Autocomplete fires, code appears, you're moving. But "moving" and "making progress" are different things.

Think about it:

→ Time spent reviewing and fixing AI-generated code counts
→ Time spent re-prompting when the output is wrong counts
→ Time spent debugging subtle issues you didn't write counts
→ The context-switching tax of evaluating suggestions counts

None of that registers as "slowdown" in the moment. It feels like collaboration. The data says otherwise.

The Industry Is Running on Self-Reported Data

This is the part that should scare you. Billions of dollars in AI tooling investment are justified by... developer sentiment surveys. 🙃

"87% of game developers are using AI agents in their workflows" or "87% of developers use AI coding tools daily." Cool. Developers also report that meetings are useful when the boss is watching. Self-reported productivity is barely one step above astrology.

The METR study is one of the few attempts to actually measure the thing everyone claims to already know. And the result is the opposite of the narrative.

I'm not saying AI coding tools are useless. I use them. But I've stopped assuming they're saving me time just because they're generating text quickly.

What This Actually Means for You

This isn't an anti-AI argument. It's a pro-honesty one.

If AI tools genuinely help you, great. But you should verify that with something harder than a gut feeling. Track your actual completion times on similar tasks. Notice when you're spending 30 minutes wrestling with generated code you could've written in 10.

The uncomfortable truth is that speed and the feeling of speed are completely different things. A tool that generates code fast but generates the wrong code fast is not a productivity tool. It's a very convincing distraction. 🎭

The main concern is not that AI tools are not efficient. It is that we will make fundamental changes to engineering organizations based on the incorrect assumption that they are efficient, and we will never stop to question that.

The Real Question

We are hiring, planning headcount, and making sprint commitments based on productivity increases. One of the only academic studies directly addresses the question and very explicitly finds the productivity increase does not exist.

This fact should trouble people on both sides of the AI excitement gap. The data doesn’t care about your feelings. And apparently, your feelings don’t care about the data.

So here's what I want to know: have you ever actually measured whether AI tools save you time, or are you running on vibes too?

Top comments (12)

Collapse
 
syedahmershah profile image
Syed Ahmer Shah

This is the reality check the industry desperately needs. The "perception vs. reality" gap is a classic example of the Dunning-Kruger effect meeting high-speed autocomplete; we feel like we're flying because the cursor is moving, but we're often just creating more "dark matter" code that requires future cleanup.

It’s a sobering reminder that velocity (speed with direction) is not the same as speed (how fast you're moving). If the direction involves subtle bugs and architectural drift, we're just moving toward a legacy nightmare 19% faster.

Collapse
 
marcosomma profile image
marcosomma • Edited

I think the underline issue is that people missuse the tool. Honestly SDD in not Vibe coding. You can not expect from an agent to perform the task that you ask with you prompt in the way you like. This is why you need a more detailed and deeper context, able to survive across session, able somewhat to "learn" form existing and produced code. This is a looong discussion. Study you pointed did not take in consideration some parameter that are pillars of SDD. Like Scope definition, tool managements. As poor your specs are as poor will be the output. From there all the iterative work that is slowing down most of the "developers". I explain more here (dev.to/marcosomma/how-i-accidental...). Happy to get your feedback.

Collapse
 
f3rnox profile image
Cris Mihalache

Good point, but it’s context-dependent: in familiar codebases, AI often adds review and debugging overhead, making experienced devs slower, while in greenfield or unfamiliar work it can still help. The real problem is teams relying on perceived productivity instead of measuring actual outcomes before making decisions.

Collapse
 
codingwithjiro profile image
Elmar Chavez

All I can say is AI error adds up faster than we think. That's one of the weaknesses every AI model is facing right now.

Collapse
 
le_woudar profile image
Kevin Tewouda

It doesn't matter to those who don't look at the code generated by the AI. A thinly veiled nod to the creators of OpenClaw, Hermes Agent, Claude Code, who clearly embrace this. 🙃
Please don't do that on a serious project. The damage of a production bug (depending on the impact) can ruin a company.

Collapse
 
toshihiro_shishido profile image
toshihiro shishido

Felt-speed vs measured-speed gap is the more interesting story. My guess: AI removes the friction of starting (no blank page anymore), which is what creates the felt-faster sensation. But the actual lag is in correction loops on AI output. The total wall-clock loses to the felt-relief.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

the study measured experienced devs on their own familiar repos - arguably the worst scenario for AI. show me numbers for junior devs on greenfield tasks and the story might look totally different

Collapse
 
anndreyy profile image
Anndrey Macedo • Edited

Can you provide the source? I searched here, but I couldn't find it.

Collapse
 
ldrscke profile image
Christian Ledermann
Collapse
 
craigcecil profile image
Craig Cecil

Here's an interesting exercise--replace "AI" with "Agile" throughout this post and see what you think.

Collapse
 
laura_ashaley_be356544300 profile image
Laura Ashaley

Interesting gap perception gains don’t always match real productivity, especially when AI adds hidden overhead like review and correction.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.