canonical_url:
KPMG just dropped a number on people in my seat. They surveyed 306 Canadian executives. 39% of them expect AI agents to be leading ...
For further actions, you may consider blocking this person and/or reporting abuse
honestly the 30-day diff exercise breaks for teams in their first 60 days of agent work, the diff is mostly tooling churn and not real role evolution. underscoped that. probably needs a "wait until your second instruction-set rewrite before measuring" caveat.
The "spec side shorter, review side longer" line lands hard from the dev seat. I'm building an AI sales chatbot — same loop shape. Hours saved on authoring get partly eaten by verification time. Net win, but the texture of the day changed more than the headcount math suggests.
What surprised me is how much of the new work is naming what "still pointing at the right thing" means in advance. Once an agent's drafting at volume, you can't review your way out of a fuzzy spec — the review loop assumes the spec was clear enough to review against in the first place.
Verification overhead is what the survey completely misses — it optimizes for headcount math, not cognitive load math. The texture shift you're describing (fewer creative hours, more review hours) is probably the more accurate leading indicator than any 39% figure. Curious what eats the most review time for you — hallucinated objections or off-brand tone?
Hallucinated objections, by a wide margin. Tone problems show up in the draft — you catch them on the first read. State hallucinations read perfectly fine; the failure only surfaces when the customer acts on the wrong info.
So the review work isn't really "does this read well." It's "does this match reality" — cross-checking the draft against system state, not just brand voice. Different (and slower) kind of work than copy review. Hardest part to staff for, because the reviewer has to know the system, not just the voice.
Curious if you see the same split on the PM side — are state hallucinations (wrong scope, wrong dependency) harder to catch than tone-off briefs?
state hallucinations are harder because they pass the tone review — the model has no signal that system state drifted since the last context load. what you're describing is a grounding gap, not a copy problem. the fix is runtime context injection (pull live data before generation), not prompt iteration. frustrating that most chatbot tooling defaults to the latter.
this is exactly the bug i shipped. "yes that's sold out" — confident, grammatical, polite. the hallucination was the politeness, which is why tone review couldn't catch it. eval pipelines built around output style don't have the signal.
runtime injection was the fix — pulled stock fresh at generation time, dropped into system prompt. that class of bug disappeared.
one wrinkle: even fresh injection can stale within milliseconds if a parallel request decrements between fetch and commit. necessary, not sufficient — still needs a last-mile check at write time.
runtime injection fixing that class of bug is the right call — fresh state at generation time is the only way to give the model accurate signal. wrinkle we hit was latency budget: injecting fresh inventory into 50+ concurrent calls added meaningful p99 tail. ended up caching at 30s intervals — tight enough to kill the sold-out hallucinations, loose enough not to crater response time.
Interesting insights, thanks for sharing.
I am curious to know how you handle governance of these AI agents. If the agent drafts something wrong and the PM doesn't pick up on that, who is accountable for that mistake in the end?
governance in my setup lives in pre-flight gates — each agent has a narrow output contract and nothing ships without human sign-off. when it drafts wrong, it gets regenerated, not published. accountability lands on the PM who approved it, not the agent — that's intentional. the failure I haven't fully cracked: drift within spec. technically valid output that misses actual intent. that's where manual review still earns its keep.
Got it, that makes sense, thanks.
yeah. gets more interesting when coordinating 4-5 agents sharing context - the gates hold but the coordination overhead compounds fast.