Posts

2026.APR.25

The Zechner-Lopopolo Continuum

Alex Volkov:

The Zechner-Lopopolo Continuum

This is a recap of the AI Engineer Europe conference that took place in London a couple of weeks ago. But the more interesting thing is the debate that the title and above image points to.

Mario Zechner (creator of the Pi coding agent, my preferred coding agent) talked about

  • why & how he built Pi (this summarises why I'm in love with Pi)
  • the complexities brought about on OSS by people wielding agents and how he is tackling these with innovative solutions like OSS Vacations/Weekends
  • (critically) advocating for reading critical code thoroughly and generally slowing down to ensure we don't drown in AI slop code

Ryan Lopopolo (from OpenAI) talked about some vague things like code being a liability and how he is a "token billionaire"; and how he has mandated his team to not look at the code. Maybe he talked about more things, I just couldn't sit through the entire talk.

If it's not obvious, I'm firmly at the Zechner end of the continuum.

Maybe this will change in a couple of years or even in just a few months, but in April 2026, anyone who is too far out on the Lopopolo end is taking on a lot of technical debt that they may not really be able to pay off.

And no: no amount of tests or specs is going to prevent that technical debt from building up, because the debt is not about correctness. The things that lead to this debt from agents are the same things that lead to debt buildup from humans: poor design choices, code duplication, needlessly defensive code, and many other such sins that agents can add at a pace hitherto unimaginable for humans.

The only way to prevent or tame this is for humans to read the code. Or break the problem down into small enough chunks so that agents actually follow the "don't duplicate code" and other testaments from our AGENTS.mds. Or in the words "human in the loop."

"But that will slow us down," I can hear some people say. Yes, slow the fuck down1.

Footnotes

  1. We'll still be way faster than we were a year ago, so don't despair.

2026.APR.25

Why Isn't Everything Different Yet?

Dave Griffith:

So: where are we? The technology exists and is impressive. The infrastructure buildout is underway and massive. Workflows are being redesigned in early-adopter organizations, often via guesswork. We've got one (1) product area (software development agents) where we're past "early adopter" and moving onto mass-market. Legal frameworks are being written badly by people who have never used the technology, which is traditional. Business models are being discovered by trial and error, also traditional. Fortunes are being made and lost, another time-honored tradition.

The critics who say nothing has changed are measuring at the wrong resolution. The critics who say change should have been instantaneous have a broken model of how change works. The honest answer is: this is going extremely fast, it will often feel slow until suddenly it doesn't, and the people who have built understanding now will not be scrambling in three years.

Amen. Good, entertaining read.

I'm going to refer people to this when they say either that things will not change dramatically or when they say that the dramatic change has already happened (so much more to come).

ai
2026.APR.25

Coding Models Are Doing Too Much

nrehiew:

If you have used any of these tools in the past year, you have probably experienced something like this: you ask the model to fix a simple bug (perhaps a single off-by-one error, or maybe a wrong operator). The model fixes the bug but half the function has been rewritten. An extra helper function has appeared. A perfectly reasonable variable name has been renamed. New input validation has been added. And the diff is enormous.

I refer to this as the Over-Editing problem where models have the tendency to rewrite code that didn't need rewriting.

Yes! A thousand times, yes.

GPT models are especially prone to this overediting problem. A part of this comes from writing code that is way too defensive1, but it's not just that — they are really eager to "fix" your code even when there is really no need for that.

Thankfully, GPT models are also very good at following instructions. So I have had instructions to circumvent this problem in my global AGENTS.md for a while and it helps quite a bit.

This is what the linked post also found: the over-editing reduces across models when they are prompted for it.

This is a good post. It's not an opinion piece, but takes a scientific approach by setting up experiments and providing evidence in the form of results.

Footnotes

  1. I've seen a couple comments saying that GPT-5.5 has gotten better in this regard and doesn't write such defensive code anymore. I'm yet to ascertain this.

2026.APR.25

Multi-Agents: What's Actually Working

I've largely sat out the hype around multi-agent orchestration or agent swarms because it felt too gimmicky. Heck, I've only recently started using subagents in a limited way (mostly explicitly invoked when I feel like something is parallelizable).

This blog post is not trying to hype these up. It is a measured take on how Cognition has been able to use some limited forms of this in production for Devin (background/cloud agent) and what they had to do to make it work well.

Walden Yan (Cognition):

1) The Code-Review-Loop that's so stupid it shouldn't work

You would think that making a model review its own code would not result in any useful findings. But even on PRs written by Devin, Devin Review catches an average of 2 bugs per PR, of which roughly 58% are severe (logic errors, missing edge cases, security vulnerabilities). Often the system will loop through multiple code-review cycles, finding new bugs each time (which isn't always great since it can take a while). Today, we make Devin and Devin Review natively iterate against one another, so that most bugs are already resolved by the time a human opens the PR.

This is effectively my (manual) workflow in almost every coding agent I have used for several months now. Of course, Cognition has automated this as a workflow, which makes sense in a background agent like Devin.

I wouldn't want to automate it in my manual workflow though, as I tend to not accept all the review comments from the review agent. Hence why I don't use extensions such as pi-review-loop which exist to do just that.


2) Large, expensive models are back - introducing "Smart Friend"

The actual architecture we used to achieve this was by offering the smarter/expansive model as a "smart friend" tool that the primary/smaller model could make a call out to. Basically, let the primary/smaller model decide when a situation was tricky enough to be worth consulting the smarter/expensive model.

This is basically akin to Amp Code's /oracle1 but invoked automatically (by exposing it as a tool). Seems obviously beneficial if the primary model is not smart enough to tackle the problem at hand.


What about unstructured swarms? We think the unstructured-swarm approach, arbitrary networks of agents negotiating with each other, is mostly a distraction. The practical shape is map-reduce-and-manage: a manager splits work, children execute, the manager synthesizes and reports back. Making this type of system feel as coherent as a single agent working on a single task is at the center of some of our upcoming work in 2026.


There's a shared through-line with all of these experiments: multi-agent systems work best today when writes stay single-threaded and the additional agents contribute intelligence rather than actions. A clean-context reviewer catches bugs the coder can't see. A frontier-level smart friend catches subtleties a weaker primary misses. A manager coordinates scope across child agents without fragmenting decisions.

The open problems are all communication problems. How does a weaker model learn when to escalate? How does a child agent surface a discovery that should change its siblings' work? How do you transfer context between agents without drowning the receiver? You can get decently far with prompting, but we also expect the next generation of models, including the ones we train ourselves, to start closing these gaps.

Footnotes

  1. Peter Steinberger has an /oracle prompt template to use in any agent for consulting GPT Pro models for such situations.

2026.APR.19

The Peril of Laziness Lost

Bryan Cantrill:

The problem is that LLMs inherently lack the virtue of laziness. Work costs nothing to an LLM. LLMs do not feel a need to optimize for their own (or anyone's) future time, and will happily dump more and more onto a layercake of garbage. Left unchecked, LLMs will make systems larger, not better—appealing to perverse vanity metrics, perhaps, but at the cost of everything that matters. As such, LLMs highlight how essential our human laziness is: our finite time forces us to develop crisp abstractions in part because we don't want to waste our (human!) time on the consequences of clunky ones. The best engineering is always borne of constraints, and the constraint of our time places limits on the cognitive load of the system that we're willing to accept. This is what drives us to make the system simpler, despite its essential complexity. As I expanded on in my talk The Complexity of Simplicity, this is a significant undertaking—and we cannot expect LLMs that do not operate under constraints of time or load to undertake it of their own volition.

So well put. Recommended reading.

2026.APR.19

Mechanical Sympathy

Vicki Boykis:

What makes good engineers good at product design is the same thing that makes them good at engineering. They feel for the boundaries of what the code and the product allows them to do and stop at those boundaries.

Another name for being able to understand and plan for affordances, either through good product intuition, or experience, or both, in the real world is mechanical sympathy.

I agree with the assertion that agentic coding tools don't have mechanical sympathy. At least as of now; maybe the future models will overcome this (but maybe not).

2026.APR.04

Slop Is Not Necessarily the Future

Soohoon Choi:

I want to argue that AI models will write good code because of economic incentives. Good code is cheaper to generate and maintain. Competition is high between the AI models right now, and the ones that win will help developers ship reliable features fastest, which requires simple, maintainable code. Good code will prevail, not only because we want it to (though we do!), but because economic forces demand it. Markets will not reward slop in coding, in the long-term.

We're still early in the AI coding adoption curve. As the technology and competition matures, economic forces will drive AI models toward generating good, simpler, code because it will be cheaper overall.

Good food for thought. But I think this argument feels a bit of wishful thinking, given no reasonable or even plausible “why” has been offered.

I’m not saying that the models are not going to get good enough and we’re going to have slop forever — if you just trace the slope of improvement over the past year, we would in fact expect the opposite. But nobody knows, or has offered any plausible path for this.

via Simon Willison

2026.MAR.14

No More Code Reviews

Philip Su:

And — you heard it here first — we’ll one day be scared, positively petrified, to use any mission-critical software known to have allowed human interference in its codebase.

Very provocative. Put this way, it does evoke the feeling that we could very well be heading into this future.

2026.MAR.10

The Deal Is So Good

Mo Bitar:

What we do is because the deal is so damn good, we change ourselves to make that deal acceptable.

And what I've figured out now is that I'm unwilling to change myself to make that deal acceptable.

I could feel the emotions as I watched the video. Well worth the time.

2026.MAR.08

End of Productivity Theater

Murat Demirbas:

I remember the early 2010s as the golden age of productivity hacking. Lifehacker, 37signals, and their ilk were everywhere, and it felt like everyone was working on jury-rigging color-coded Moleskine task-trackers and web apps into the perfect Getting Things Done system.

So recently I found myself wondering: what happened to all that excitement? Did I just outgrow the productivity movement, or did the movement itself lose stream?

I was very much in the audience for the productivity theatre. I still am to an extent, even if the stage has lost most of its oomph. A good, short read.