Posts

2026.MAR.14

No More Code Reviews

Philip Su:

And — you heard it here first — we’ll one day be scared, positively petrified, to use any mission-critical software known to have allowed human interference in its codebase.

Very provocative. Put this way, it does evoke the feeling that we could very well be heading into this future.

2026.MAR.09

The Deal Is So Good

Mo Bitar:

What we do is because the deal is so damn good, we change ourselves to make that deal acceptable.

And what I've figured out now is that I'm unwilling to change myself to make that deal acceptable.

I could feel the emotions as I watched the video. Well worth the time.

2026.MAR.08

End of Productivity Theater

Murat Demirbas:

I remember the early 2010s as the golden age of productivity hacking. Lifehacker, 37signals, and their ilk were everywhere, and it felt like everyone was working on jury-rigging color-coded Moleskine task-trackers and web apps into the perfect Getting Things Done system.

So recently I found myself wondering: what happened to all that excitement? Did I just outgrow the productivity movement, or did the movement itself lose stream?

I was very much in the audience for the productivity theatre. I still am to an extent, even if the stage has lost most of its oomph. A good, short read.

2026.FEB.28

The Third Era of AI Software Development

Michael Truell, CEO of Cursor:

When we started building Cursor a few years ago, most code was written one keystroke at a time. Tab autocomplete changed that and opened the first era of AI-assisted coding. Then agents arrived, and developers shifted to directing agents through synchronous prompt-and-response loops. That was the second era.

Now a third era is arriving. It is defined by agents that can tackle larger tasks independently, over longer timescales, with less human direction. As a result, Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software.

Thirty-five percent of the PRs we merge internally at Cursor are now created by agents operating autonomously in cloud VMs.

Agent Orchestration and Agent Swarms are a couple of ways folks are referring to this idea. Steve Yegge had predicted several weeks ago (a very long time horizon in the world of AI) that this would be the next frontier in agentic engineering.

I remain sceptical though. I'm not saying I don't trust that 35% of the PRs at Cursor are being opened this way; it is very believable, given how good the frontier models are now. But not all PRs are equal, and I wager these 35% are relatively simpler bugs/features. Or that there is a lot more work being done to iterate on the PRs once they are raised.

My argument isn't that this isn't useful work. On the contrary, background agents taking such issues off of engineers' hands is invaluable, as they can now focus on work that provides higher leverage. But I don't believe the logical extension of this is that "the vast majority of development work" will be done this way in a year.

2026.FEB.27

Two Beliefs About Coding Agents

Drew Breunig:

I'm lucky enough to talk to a range of developers and teams, spanning a variety of company sizes and a broad array of skill sets. From these conversations, two beliefs have emerged and solidified about coding agents and their (current) impact on coding.

Drew makes two very astute observations, both of which I endorse. The first one in particular is under-appreciated:

Most talented developers do not appreciate the impact of the intuitive knowledge they bring to their coding agent.

Coding agents are amplifiers of skills of the engineers that wield them, they are not magic beans that'll let an amateur cook up a compiler.

The second observation should be obvious to anyone who has built software products, but somehow the current mania is making people ignore it:

Most work people are sharing are incredible personal tools, but they are not capital-P products.

2026.FEB.18

Codex CLI vs Claude Code on Autonomy

nilenso:

I spent some time studying the system prompts of coding agent harnesses like Codex CLI and Claude Code. These prompts reveal the priorities, values, and scars of their products. They're only a few pages each and worth reading in full, especially if you use them every day. This approach to understanding such products is more grounded than the vibe-based takes you often see in feeds.

While there are many similarities and differences between them, one of the most commonly perceived differences between Claude Code and Codex CLI is autonomy, and in this post I'll share what I observed. We tend to perceive autonomous behaviour as long-running, independent, or requiring less supervision and guidance. Reading the system prompts, it becomes apparent that the products make very different, and very intentional choices.

Very interesting comparison. But I don't believe the difference in the behaviour is primarily, or even likely, driven by the system prompts. The difference is far more ingrained, most likely RL'd during post-training.

Why do I say this? I've been using both the models in Pi coding agent with its default system prompt1, which is both really small and the same for all models. And even in Pi, this difference in behaviour comes across clearly.2

Footnotes

  1. Pi allows us to replace the entire system prompt by placing a markdown file at ~/.pi/agent/SYSTEM.md

  2. I feel that the models both behave better in Pi than in their respective canonical harnesses; but this is a very subjective opinion.

2026.FEB.16

SaaS Isn't Dead. It's Worse Than That.

Michael Bloch:

I'm more bullish on AI than I've ever been. And that's exactly why I'm bearish on most software companies. Not because their customers will leave, but because their next thirty competitors just got a lot easier to build.

I've seen/heard a bunch of different people quip exactly this. This is one of the crispest articulations. Rings ominous to me.

2026.FEB.14

The Final Bottleneck

Armin Ronacher:

I too am the bottleneck now. But you know what? Two years ago, I too was the bottleneck. I was the bottleneck all along. The machine did not really change that. And for as long as I carry responsibilities and am accountable, this will remain true. If we manage to push accountability upwards, it might change, but so far, how that would happen is not clear.

I too am the bottleneck. And I'm glad I am. When I stop being the bottleneck, I'm no longer involved at all. And if I'm not involved, it doesn't matter to me.

A very good and thought-provoking read.

2026.FEB.11

Showboat and Rodney — Agents Demoing Their Work

Simon Willison:

A key challenge working with coding agents is having them both test what they've built and demonstrate that software to you, their overseer. This goes beyond automated tests—we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do.

Simon's response to this challenge is two CLIs, Showboat & Rodney.

Showboat:

It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do.

This might be a very useful artefact to include in PRs (assuming they are supposed to be reviewed by humans of course!)

Rodney:

Rodney is a CLI tool for browser automation designed to work with Showboat. It can navigate to URLs, take screenshots, click on elements and fill in forms.

Rodney is quite interesting too. There are a few such CLIs/skills for agents to control browsers for testing out there: Vercel's agent-browser seems very popular, but there are a few others as well on skills.sh.

I'm currently using web-browser skill from mitsuhiko on GitHub, which has a set of typescript scripts that control a Chrome browser using CDP (similar to Rodney); it has no npm dependencies save for one websockets lib. This works well, but I'm going give Rodney a try because, being able to run using uvx means that it should work well in environments like Codex for web (which has uv and Chrome) without additional setup.