Jaskaran Singh — Senior Software Engineer, AI Trainer
A few weeks ago I watched an agent open a GitHub issue, write the fix, run the tests, and open a pull request. No human typed a line of code. The PR passed review.
I didn't find this inspiring. I found it genuinely disorienting. I say that as someone who trains AI models for a living and is currently building an agent of my own.
If you're a software engineer in 2026 and you haven't had that moment yet, you will. Agentic AI is being called the third seismic shift in software engineering this century, after open source and DevOps. That framing might be overblown. It might not be. Either way, something real is happening and it's worth thinking clearly about instead of panicking or dismissing it.
The Numbers Stopped Being Theoretical
Source: Unsplash — Luke Chesser
A survey of nearly 1,000 engineers published in early 2026 found that 95% use AI tools at least weekly, 75% use AI for half or more of their engineering work, and 55% regularly use AI agents. That last number is the one that matters. Copilots have been mainstream for two years. Agents are different.
A copilot suggests. An agent acts. It reads your codebase, decides what to do, does it, checks whether it worked, and tries again if it didn't. The feedback loop is closed without you in it.
In 2025, coding agents moved from experimental tools to production systems shipping real features to real customers. In 2026, single agents are becoming coordinated teams of agents.
I've been watching this from an unusual angle. My job involves evaluating AI-generated code for quality: finding the failure modes, writing the rubrics, doing the multi-turn reviews. At the same time I'm building a Python agent that monitors the OINP immigration portal and pushes Telegram alerts whenever a new Masters Graduate stream draw drops. Two different relationships with the same technology, and both have given me a clearer picture than I'd have from either side alone.
What Agents Are Actually Good At
Agents handle implementation tasks well when the problem is well-scoped and verifiable. "Add pagination to this endpoint." "Write tests for this module." "Refactor this class to use dependency injection." Tasks with clear success criteria: the code runs, the tests pass, the interface contract is unchanged. The agent can verify its own work.
Quality still varies. My evaluation work confirms what engineers describe: intuitions for delegation develop over time. People hand off tasks that are easily verifiable or low-stakes. That intuition is real and it matters. Knowing what to delegate is itself a skill now.
Where agents fall apart is anything requiring judgment about what the right problem even is. An agent given an ambiguous brief will confidently solve the wrong version of it. I've seen this pattern repeatedly, not as an occasional edge case but as a consistent failure mode when the task specification has gaps. The agent doesn't ask for clarification. It infers, fills in, and proceeds. Sometimes the inference is right. When it's wrong, it's wrong in ways that are coherent and hard to catch. That's the part that should make you nervous.
The Shift That's Actually Happening to Engineering Teams
Gartner predicts 80% of organizations will evolve large software engineering teams into smaller, AI-augmented teams by 2030. The trajectory is already visible. Teams that used to need eight engineers to maintain a product are running it with four. Not because the other four got fired, but because agent-assisted output per engineer went up enough that the headcount math changed.
The pattern emerging in 2026: software development is moving toward human expertise focused on defining problems worth solving while AI handles the tactical implementation work.
That framing is mostly right but it undersells something. "Defining problems worth solving" sounds clean and strategic. In practice it means writing a spec precise enough that an agent doesn't go off the rails, reviewing agent output at a level that catches subtle correctness issues, and making architecture decisions that hold up when the agent starts filling in implementations you didn't anticipate.
Those are all hard skills. They're also different from the skills that got most of us into engineering. We learned by writing the implementation ourselves. The feedback loop of "I wrote this, it broke, I understand why" is how you build the mental models that make good judgment possible. Whether that judgment transfers cleanly to directing agents at tasks you've never done yourself is an open question. I don't think anyone knows yet.
What This Means If You're Mid-Career
I'm five years in. I've shipped production Android apps, done fintech work, and I'm now working at the AI training layer. The people who seem least threatened by this shift share one thing: they understand systems, not just syntax.
A developer who knows Kotlin and can write Jetpack Compose components is in a different position than one who understands why coroutine cancellation works the way it does, when a ViewModel scope is the wrong choice, and what the architectural consequences of a particular state management approach are three features down the road. The first kind of knowledge is increasingly delegatable. The second is what you need to review what the agent produces.
This is not a comfortable message. It basically says the work that builds deep knowledge is being automated before you've had a chance to accumulate it through repetition. That's a real problem for junior developers and I don't have a clean answer to it. Engineers who actively seek out the "why" behind every pattern they use, even when an agent handed them that pattern, will pull ahead of those who treat agent output as a black box. That's my best guess.
The Security Problem Nobody Talks About
Source: Unsplash — Lewis Kang'ethe Ngugi
Agentic coding is changing security in two directions. As models get more capable, building security into products gets easier. The same capabilities that help defenders help attackers too.
There's a third direction worth adding from my evaluation work: agents introduce security risks through confident implementation of insecure patterns. An agent writing a data pipeline reaches for the most direct path to working code. Input sanitization, parameterized queries, credential management, error handling that doesn't leak internals: these require deliberate thought. Agents do them inconsistently.
The more autonomous the coding pipeline, the more critical it is to have security review that isn't the same agent that wrote the code. I've flagged SQL injection vulnerabilities in agent-generated Python and credential handling issues in agent-generated Kotlin. The code was functionally correct. It would have passed a cursory review. It shouldn't have shipped.
Why I'm Still Building Agents
None of this made me stop building the OINP monitoring bot. It made me more deliberate about it.
The thing I'm building isn't trying to do something clever. It checks a government webpage on a schedule, parses the draw results, compares against the last known state, and fires a Telegram message if something changed. The agent part is the parsing logic: handling inconsistencies in how the page is structured, dealing with cases where the data format shifts slightly. That's a good fit for what these tools are actually good at.
The immigration system in Canada is opaque in ways that are genuinely stressful for people on it. If a monitoring tool reduces that stress even slightly, it's worth the weekend. The judgment about what's worth building and why is still entirely mine.
That's probably the honest answer to "now what." The judgment work is still yours. The implementation is increasingly negotiable.
Jaskaran Singh is a Senior Software Engineer working in AI training and evaluation, with production experience in Android development using Kotlin and Flutter. Currently building a Python-based OINP immigration monitoring agent.
Top comments (7)
The line that landed for me was about the agent being wrong in ways that are coherent and hard to catch. That's the specific failure mode that keeps me up. Not the obvious errors—those you spot immediately and fix. It's the plausible, internally consistent, slightly-off solution that slides through review because nothing looks broken.
What's quietly terrifying is that reviewing agent output might actually be harder than writing the code yourself. When you write it, you carry the mental model of why each decision was made. You know where the sharp edges are because you put them there. Reviewing someone else's work—even an agent's—means reconstructing intent from artifact. You're reverse-engineering the reasoning from the output. And if the output is coherent but wrong, the mismatch between "looks right" and "is right" is almost invisible.
Your point about junior developers missing the repetition that builds judgment is the long-term version of the same problem. If you never write the insecure SQL query yourself, never see it get exploited, never feel that specific sting, do you develop the instinct to spot it in a code review? Or do you just learn to trust that the agent "usually" handles it?
The OINP bot is a good counterweight. Small scope, clear success criteria, low stakes if it gets the parsing slightly wrong. That's the sweet spot. What I'm wondering is whether you've found yourself writing more tests for agent-generated code than you would for your own. Feels like that might be the unspoken tradeoff—delegating implementation but doubling down on verification.
The reverse-engineering-intent-from-artifact framing is exactly right and I hadn't put words to it that cleanly. When you write it yourself, the mental model and the code exist together. When you review agent output, you only have the code. You're inferring the model, and if the model was slightly off to begin with, you're reconstructing a faulty map from terrain that looks completely normal.
On the tests question: yes, noticeably. I write more, and I write different tests. When I write the code myself I tend to know the edge cases because I thought through them while writing. With agent output I don't have that context, so I end up writing tests that are almost adversarial. Trying to break it rather than confirm it works. It's slower but I think it's the right instinct.
The SQL exploit point is the one I keep coming back to. There's a version of the next generation of developers who are extremely good at prompting and reviewing but have never felt the specific dread of finding an injection vulnerability in their own code. Whether that's fine or a real gap, I genuinely don't know. My gut says it matters but I can't fully defend that yet.
The point about orchestrating agents you don't own being a trust problem is underrated. HTTP gives you nothing to attach identity or an audit trail to, it just wasn't built for this. Pilot protocol (pilotprotocol.network) is pretty good for this. It handles agent identity at the network layer rather than the app layer, which matters a lot once you start routing tasks to external agents.
Test comment for dev.to exploration
Interesting take ¡ª the disorientation is real. Curious how you think about the gap between agents that ship features and agents that own outcomes.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.