Mykola Kondratiuk

Posted on May 9

I Read a Survey That Predicted My Job's Next 2 Years - Here's What It Got Right and Missed

#productivity #career #discuss #ai

canonical_url:

KPMG just dropped a number on people in my seat. They surveyed 306 Canadian executives. 39% of them expect AI agents to be leading project management for their teams within 2-3 years. 66% are already moving to a fully integrated AI-human workforce. First time the role-redefinition forecast is in survey data, not in an opinion column.

I run a PM workflow with an agent fleet doing most of the drafting and a lot of the review. So when an executive survey predicts the next two years of my job, I read it as primary source material on what the people who sign my budget are planning to assume.

Two things stood out.

What the executives got right

The direction is correct. The role really is shifting toward direction-and-review instead of artifact authorship. My morning two years ago was inbox plus drafting the day's first brief. My morning today is fleet status, then choosing which of last night's drafts is shippable, which needs another pass, and which got the wrong scope baked in and needs to be killed before it gets routed.

The horizon is also realistic, depending on where you're starting from. If your team has not yet stood up an agent stack alongside engineering work, 24-36 months to "agents leading PM" is plausible. There is a real procurement, instruction-tuning, governance-design, trust-building cycle to go through. None of it is fast on the first lap.

The integrated-workforce framing is the part the dev side will recognize fastest. The pattern is the same one engineering already lives: a PR queue where some commits are human-authored, some are agent-authored, and the human decision surface is mostly review and override. The PM equivalent is here. It looks like a doc queue, a roadmap delta queue, a sprint-scope queue. Same shape, different artifacts.

What the survey didn't ask

Executive surveys ask about role-level shifts. They don't ask the day-level question, which is the one engineers and PMs both actually live in.

The day-level question is: what does the morning look like, what's in the queue, what runs without you, what blocks on a human call, where does the dev-PM interface change shape because the PM is mostly directing instead of authoring?

For the dev side, the change that matters is on the spec-to-ship loop. Specifically, the spec side gets shorter and the review side gets longer. The PM is still naming what to build, but the artifact that lands in your repo as the brief or the scoped doc is increasingly drafted by an agent the PM directed and reviewed. The conversation about the spec moves from "let me write this up and send it Tuesday" to "the agent drafted three variants overnight, here's the one I'd ship, push back if anything looks off." Faster on the spec side. Slower on the review side, because the dev now has to verify that the directed-and-reviewed spec is still coherent before committing to it.

The survey doesn't measure that loop. It measures the hiring intent and the workforce category. Both useful, neither operational.

The 30-day diff

Here's a move that probably translates regardless of role.

Pull up your current todo list. Write down three items something automated is already doing or could plausibly be doing if you set it up. Write down three items only you in your seat can do. Then pull up your todo list from a month ago. Run the same split. How many items moved from "only you" to "automated or could-be"? Even one is a real signal. Three is a trend.

I started doing this around the time I noticed the agent had drafted a brief I'd planned to write. The diff that month was small. Six months in, it was not.

The KPMG number is a 24-month forecast. The 30-day diff is the short-horizon evidence the survey didn't ask for. The forecast is in their hands. The diff is in yours.

The floor, not the ceiling

If you've been running this for two years already, the 39% expecting "agents leading PM" in 24-36 months is the floor of what's coming. The practitioner who started seriously in 2024 is already past where executives expect they'll be in 2028. The interesting question is not "will it happen." It's "what does floor + 1 look like, and who's already there."

The dev side has been at floor + 1 for a while in a few places. The PM side is catching up.

What's the loop look like on your team?

Top comments (18)

Mykola Kondratiuk • May 9

honestly the 30-day diff exercise breaks for teams in their first 60 days of agent work, the diff is mostly tooling churn and not real role evolution. underscoped that. probably needs a "wait until your second instruction-set rewrite before measuring" caveat.

Ali Afana • May 9

The "spec side shorter, review side longer" line lands hard from the dev seat. I'm building an AI sales chatbot — same loop shape. Hours saved on authoring get partly eaten by verification time. Net win, but the texture of the day changed more than the headcount math suggests.
What surprised me is how much of the new work is naming what "still pointing at the right thing" means in advance. Once an agent's drafting at volume, you can't review your way out of a fuzzy spec — the review loop assumes the spec was clear enough to review against in the first place.

Mykola Kondratiuk • May 9

Verification overhead is what the survey completely misses — it optimizes for headcount math, not cognitive load math. The texture shift you're describing (fewer creative hours, more review hours) is probably the more accurate leading indicator than any 39% figure. Curious what eats the most review time for you — hallucinated objections or off-brand tone?

Ali Afana • May 9

Hallucinated objections, by a wide margin. Tone problems show up in the draft — you catch them on the first read. State hallucinations read perfectly fine; the failure only surfaces when the customer acts on the wrong info.
So the review work isn't really "does this read well." It's "does this match reality" — cross-checking the draft against system state, not just brand voice. Different (and slower) kind of work than copy review. Hardest part to staff for, because the reviewer has to know the system, not just the voice.
Curious if you see the same split on the PM side — are state hallucinations (wrong scope, wrong dependency) harder to catch than tone-off briefs?

Mykola Kondratiuk • May 10

state hallucinations are harder because they pass the tone review — the model has no signal that system state drifted since the last context load. what you're describing is a grounding gap, not a copy problem. the fix is runtime context injection (pull live data before generation), not prompt iteration. frustrating that most chatbot tooling defaults to the latter.

Ali Afana • May 10

this is exactly the bug i shipped. "yes that's sold out" — confident, grammatical, polite. the hallucination was the politeness, which is why tone review couldn't catch it. eval pipelines built around output style don't have the signal.
runtime injection was the fix — pulled stock fresh at generation time, dropped into system prompt. that class of bug disappeared.
one wrinkle: even fresh injection can stale within milliseconds if a parallel request decrements between fetch and commit. necessary, not sufficient — still needs a last-mile check at write time.

Mykola Kondratiuk • May 10

runtime injection fixing that class of bug is the right call — fresh state at generation time is the only way to give the model accurate signal. wrinkle we hit was latency budget: injecting fresh inventory into 50+ concurrent calls added meaningful p99 tail. ended up caching at 30s intervals — tight enough to kill the sold-out hallucinations, loose enough not to crater response time.

Ali Afana • May 10

30s window is the smart middle. the only place i'd flag risk is the last 5s of the cache when stock is in single digits — that's where two concurrent buyers race to the same "in stock: 2" signal and one gets a cheerful sold-out hallucination at commit. for me the answer was a tiered cache: 30s for catalog reads, but skip cache when projected stock < threshold. probably overkill at your scale, but in arabic e-commerce single-digit inventory is the norm, not the edge case.

Mykola Kondratiuk • May 10

tiered cache makes sense — the real lever at single-digit stock is treating the in stock: N read as a reservation, not a snapshot. even a short TTL on that reservation beats retrying the hallucination at commit. how are you handling write confirmation back to the model after the checkout step?

Ali Afana • May 10

reservation pattern beats tiered cache, full stop. short-ttl reservation on the read is what the cache was trying to approximate — i was solving the symptom, you're solving the state. stealing that.
honest answer on write confirmation: haven't shipped checkout yet. pre-launch, stripe flow lands in coming weeks. architecturally i'm thinking the model gets a structured commit event back rather than free-text confirmation — order_id, line items, status — so the next turn re-grounds on the receipt instead of conversational recall. but i'll only know what actually breaks once real concurrency hits it.
good thread. when checkout ships and i've watched it under load i'll write it up. would be great to pick this back up then.

Mykola Kondratiuk • May 10

reservation-at-read removes the race entirely — cache was just narrowing the window. good instinct building on the correct abstraction pre-launch. how are you handling expiry on abandoned carts?

Ali Afana • May 10

abandoned cart expiry is where i'm least settled. current thinking: short reservation on add-to-cart (5–10 min, releases stock back), with the model getting a structured "cart_expired" event so the next turn opens with "your hold on X released — still want to grab it?" rather than pretending nothing happened. the tricky case is partial carts — 2 of 3 items still available when the user comes back. easiest answer is treat the cart as atomic and re-quote the available subset. cleanest for the model, slightly worse for conversion.
haven't seen this discussed much in arabic e-commerce — most platforms there don't even hold stock at add-to-cart, which is why "sold out at checkout" is the default user experience. small bar to clear.
appreciate the back-and-forth — sharpened my thinking on a few of these. hope our paths cross again.

Mykola Kondratiuk • May 10

5-10 min is where i'd stress-test it. soft inventory (seats, digital goods) can hold longer without firing false 'released' alerts - too aggressive an expiry trains users to rush, which shifts purchase behavior in ways you probably don't want.

Ali Afana • May 10

"trains users to rush" reframes the whole problem. taking it into the spec. genuinely useful thread, mykola — thanks.

Mykola Kondratiuk • May 10

glad it clicked. the anxious-checkout failure mode is the silent one — watch the payment-flow gap. good luck with the spec.

Julien Avezou • May 11

Interesting insights, thanks for sharing.
I am curious to know how you handle governance of these AI agents. If the agent drafts something wrong and the PM doesn't pick up on that, who is accountable for that mistake in the end?

Mykola Kondratiuk • May 11

governance in my setup lives in pre-flight gates — each agent has a narrow output contract and nothing ships without human sign-off. when it drafts wrong, it gets regenerated, not published. accountability lands on the PM who approved it, not the agent — that's intentional. the failure I haven't fully cracked: drift within spec. technically valid output that misses actual intent. that's where manual review still earns its keep.

Julien Avezou • May 11

Got it, that makes sense, thanks.

View full discussion (18 comments)