Posts

2026.FEB.18

Codex CLI vs Claude Code on Autonomy

nilenso:

I spent some time studying the system prompts of coding agent harnesses like Codex CLI and Claude Code. These prompts reveal the priorities, values, and scars of their products. They're only a few pages each and worth reading in full, especially if you use them every day. This approach to understanding such products is more grounded than the vibe-based takes you often see in feeds.

While there are many similarities and differences between them, one of the most commonly perceived differences between Claude Code and Codex CLI is autonomy, and in this post I'll share what I observed. We tend to perceive autonomous behaviour as long-running, independent, or requiring less supervision and guidance. Reading the system prompts, it becomes apparent that the products make very different, and very intentional choices.

Very interesting comparison. But I don't believe the difference in the behaviour is primarily, or even likely, driven by the system prompts. The difference is far more ingrained, most likely RL'd during post-training.

Why do I say this? I've been using both the models in Pi coding agent with its default system prompt1, which is both really small and the same for all models. And even in Pi, this difference in behaviour comes across clearly.2

Footnotes

  1. Pi allows us to replace the entire system prompt by placing a markdown file at ~/.pi/agent/SYSTEM.md

  2. I feel that the models both behave better in Pi than in their respective canonical harnesses; but this is a very subjective opinion.

2026.FEB.16

SaaS Isn't Dead. It's Worse Than That.

Michael Bloch:

I'm more bullish on AI than I've ever been. And that's exactly why I'm bearish on most software companies. Not because their customers will leave, but because their next thirty competitors just got a lot easier to build.

I've seen/heard a bunch of different people quip exactly this. This is one of the crispest articulations. Rings ominous to me.

2026.FEB.15

Cognitive Debt

From How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt (via Simon Willison):

Cognitive debt, a term gaining traction recently, instead communicates the notion that the debt compounded from going fast lives in the brains of the developers and affects their lived experiences and abilities to "go fast" or to make changes. Even if AI agents produce code that could be easy to understand, the humans involved may have simply lost the plot and may not understand what the program is supposed to do, how their intentions were implemented, or how to possibly change it.

I hadn't come across this term before. It is a useful one to add to our collective vocabulary. I suppose that in just a couple of years we'll all be talking about this phenomenon like we talk about technical debt now.

I haven't personally felt this way yet, maybe that means I'm not fully embracing and giving in to the agents. But I can feel the urge to go there.

I bet that one of the best ways to avoid getting into cognitive debt is to continue to be the bottleneck.

2026.FEB.14

The Final Bottleneck

Armin Ronacher:

I too am the bottleneck now. But you know what? Two years ago, I too was the bottleneck. I was the bottleneck all along. The machine did not really change that. And for as long as I carry responsibilities and am accountable, this will remain true. If we manage to push accountability upwards, it might change, but so far, how that would happen is not clear.

I too am the bottleneck. And I'm glad I am. When I stop being the bottleneck, I'm no longer involved at all. And if I'm not involved, it doesn't matter to me.

A very good and thought-provoking read.

2026.FEB.11

Showboat and Rodney — Agents Demoing Their Work

Simon Willison:

A key challenge working with coding agents is having them both test what they've built and demonstrate that software to you, their overseer. This goes beyond automated tests—we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do.

Simon's response to this challenge is two CLIs, Showboat & Rodney.

Showboat:

It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do.

This might be a very useful artefact to include in PRs (assuming they are supposed to be reviewed by humans of course!)

Rodney:

Rodney is a CLI tool for browser automation designed to work with Showboat. It can navigate to URLs, take screenshots, click on elements and fill in forms.

Rodney is quite interesting too. There are a few such CLIs/skills for agents to control browsers for testing out there: Vercel's agent-browser seems very popular, but there are a few others as well on skills.sh.

I'm currently using web-browser skill from mitsuhiko on GitHub, which has a set of typescript scripts that control a Chrome browser using CDP (similar to Rodney); it has no npm dependencies save for one websockets lib. This works well, but I'm going give Rodney a try because, being able to run using uvx means that it should work well in environments like Codex for web (which has uv and Chrome) without additional setup.

2026.FEB.10

Agentic Engineering

Andrej Karpathy, on the one-year anniversary of coining "vibe coding" (emphasis mine):

The one thing I'd add is that at the time, LLM capability was low enough that you'd mostly use vibe coding for fun throwaway projects, demos and explorations. It was good fun and it almost worked. Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software.

Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favorite "agentic engineering":

  • "agentic" because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight.
  • "engineering" to emphasize that there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind.

I like "agentic engineering" and I'm seeing more people use this term. I'm going to adopt this going forward. Changing the tag on my blog from "AI Coding" to "Agentic Engineering".

2026.FEB.09

Feedback Loopable

New term added to my vocab that I'm going to use a lot: "feedback loopable".

Lewis Metcalf:

Agents are most powerful when they can validate their work against reality. When they have feedback loops. The problem is, some work is hard to organize in a way that an agent can easily get feedback. The software we build and the tools we use are built for humans. Humans with eyeballs and hands and fingers.

This article is about how to make those problems easier for agents. It's a way to create an environment for your agents so that they can solve problems on their own, and so that you (the human) can intuitively guide them without getting in the way.

This process of building something for humans using methods built for agents is what I call: making it feedback loopable.

2026.FEB.09

Mitchell Hashimoto's AI Adoption Journey

This is a good post to share with anyone who is still sceptical about agentic engineering. Mitchell Hashimoto (creator of Vagrant, co-founder of HashiCorp, now building Ghostty) goes through his journey from "this isn't really helpful at all" to "it consistently adds value."

Instead of giving up, I forced myself to reproduce all my manual commits with agentic ones. I literally did the work twice. I'd do the work manually, and then I'd fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).

This was excruciating, because it got in the way of simply getting things done. But I've been around the block with non-AI tools enough to know that friction is natural, and I can't come to a firm, defensible conclusion without exhausting my efforts.

What's noteworthy is that it doesn't happen naturally for Mitchell — he had to put in explicit effort into making this work and he didn't give up when it wasn't working all too well. That may be the big difference between those who are excited about agentic engineering vs not.

2026.JAN.07

You can make up HTML tags

Browsers handle unrecognized tags by treating them as a generic element, with no effect beyond what’s specified in the CSS. This isn’t just a weird quirk, but is standardized behavior. If you include hyphens in the name, you can guarantee that your tag won’t appear in any future versions of HTML.

This is so cool, and I had never heard of this before. I wonder why this is not more popular — really semantic HTML!