Posts

2026.FEB.11

Showboat and Rodney — Agents Demoing Their Work

Simon Willison:

A key challenge working with coding agents is having them both test what they've built and demonstrate that software to you, their overseer. This goes beyond automated tests—we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do.

Simon's response to this challenge is two CLIs, Showboat & Rodney.

Showboat:

It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do.

This might be a very useful artefact to include in PRs (assuming they are supposed to be reviewed by humans of course!)

Rodney:

Rodney is a CLI tool for browser automation designed to work with Showboat. It can navigate to URLs, take screenshots, click on elements and fill in forms.

Rodney is quite interesting too. There are a few such CLIs/skills for agents to control browsers for testing out there: Vercel's agent-browser seems very popular, but there are a few others as well on skills.sh.

I'm currently using web-browser skill from mitsuhiko on GitHub, which has a set of typescript scripts that control a Chrome browser using CDP (similar to Rodney); it has no npm dependencies save for one websockets lib. This works well, but I'm going give Rodney a try because, being able to run using uvx means that it should work well in environments like Codex for web (which has uv and Chrome) without additional setup.

2026.FEB.09

Mitchell Hashimoto's AI Adoption Journey

This is a good post to share with anyone who is still sceptical about agentic engineering. Mitchell Hashimoto (creator of Vagrant, co-founder of HashiCorp, now building Ghostty) goes through his journey from "this isn't really helpful at all" to "it consistently adds value."

Instead of giving up, I forced myself to reproduce all my manual commits with agentic ones. I literally did the work twice. I'd do the work manually, and then I'd fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).

This was excruciating, because it got in the way of simply getting things done. But I've been around the block with non-AI tools enough to know that friction is natural, and I can't come to a firm, defensible conclusion without exhausting my efforts.

What's noteworthy is that it doesn't happen naturally for Mitchell — he had to put in explicit effort into making this work and he didn't give up when it wasn't working all too well. That may be the big difference between those who are excited about agentic engineering vs not.

2026.FEB.09

Feedback Loopable

New term added to my vocab that I'm going to use a lot: "feedback loopable".

Lewis Metcalf:

Agents are most powerful when they can validate their work against reality. When they have feedback loops. The problem is, some work is hard to organize in a way that an agent can easily get feedback. The software we build and the tools we use are built for humans. Humans with eyeballs and hands and fingers.

This article is about how to make those problems easier for agents. It's a way to create an environment for your agents so that they can solve problems on their own, and so that you (the human) can intuitively guide them without getting in the way.

This process of building something for humans using methods built for agents is what I call: making it feedback loopable.

2026.JAN.07

You can make up HTML tags

Browsers handle unrecognized tags by treating them as a generic element, with no effect beyond what’s specified in the CSS. This isn’t just a weird quirk, but is standardized behavior. If you include hyphens in the name, you can guarantee that your tag won’t appear in any future versions of HTML.

This is so cool, and I had never heard of this before. I wonder why this is not more popular — really semantic HTML!

2025.AUG.02

The Bitter Lesson versus The Garbage Can

A thought-provoking article that, on the surface, explores which modality of AI agent deployment is more likely to succeed in a large organisation — agents carefully designed around organisational processes, or general-purpose agents trained to seek successful outcomes (RL, for example).

But dig a little deeper, and it raises a more fundamental question: what shape will successful AI-powered products take?

Ethan Mollick:

For many people, this may not be a surprise. One thing you learn studying (or working in) organizations is that they are all actually a bit of a mess. In fact, one classic organizational theory is actually called the Garbage Can Model. This views organizations as chaotic “garbage cans” where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process.

Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.

ai
2025.APR.25

Ship Software That Does Nothing

Kerrick Long:

Many people will tell you to ship a minimum viable product. Others say to ship a prototype to get feedback. Not me. I think you should ship a blank page to your production servers on day one.

Notice how much is involved in shipping software that does nothing. This work will come around eventually. The later you do it, the riskier it is.

Not satire. Good food for thought.

2025.APR.13

AI adoption is a UX problem

Nan Yu:

These tools are casually dismissed as “GPT wrappers” by some industry commentators — after all, ChatGPT (or Sonnet or Gemini or Llama or Deepseek) is doing all the “real work”, right?

People who take this perspective seem to be throwing away all the lessons we’ve learned about software distribution. It’s like they saw Instagram and waived it off as an “ImageMagick wrapper”… or Dropbox as an “rsync wrapper”.

Those products won because they made powerful, highly technical tools accessible through thoughtful design. The biggest barrier to mass AI adoption is not capability or intelligence; we have those in spades. It’s UX.

Amen.

ai
2025.APR.06

Building Python tools with a one-shot prompt using uv run and Claude Projects

Nice and clever use of uv’s run inline dependency management and Claude Project Custom Instructions to create Python scripts that are easy to run without any setup, even while depending on Python’s rich set of libraries.

I’ve used this workflow for a few scripts in the last couple of weeks, and it works remarkably well.

You can then go a step further — add uv into the shebang line for a Python script to make it a self-contained executable.

2025.MAR.30

The End of Programming as We Know It

It is not the end of programming. It is the end of programming as we know it today. That is not new. The first programmers connected physical circuits to perform each calculation. They were succeeded by programmers writing machine instructions as binary code to be input one bit at a time by flipping switches on the front of a computer. Assembly language programming then put an end to that. It lets a programmer use a human-like language to tell the computer to move data to locations in memory and perform calculations on it. Then, development of even higher-level compiled languages like Fortran, COBOL, and their successors C, C++, and Java meant that most programmers no longer wrote assembly code. Instead, they could express their wishes to the computer using higher level abstractions.

Eventually, interpreted languages, which are much easier to debug, became the norm.

BASIC, one of the first of these to hit the big time, was at first seen as a toy, but soon proved to be the wave of the future. Programming became accessible to kids and garage entrepreneurs, not just the back office priesthood at large companies and government agencies.

The above is the central thesis of the article, and I strongly agree with it. The article also explores many more angles on the impact of AI coding, mostly through quoting and referencing others; I agree with some of these perspectives, whilst not so much with others. Nevertheless, it’s quite thought-provoking.

2025.MAR.30

How NAT Traversal Works

A Stratechery interview with the CEO of Tailscale dropped a few weeks ago. Tailscale is one of my favourite kinds of companies — focused on a single product that’s deeply technical and yet simple and delightful to use. I’m a longtime user and love the product. The interview is fun to listen to.

It reminded me of this old article that Tailscale published: How NAT traversal works. It’s an in-depth treatise on a topic that most of us never think about, but a critical problem to solve for those designing peer-to-peer networking software.

It’s a very long read, but a captivating one nevertheless. There are all kinds of interesting technical details, and some aha moments like how the ideas of The Birthday Paradox are used to devise an algorithm for NAT-busting (in a section delightfully titled “NAT notes for nerds”).