Posts

2026.FEB.11

Showboat and Rodney — Agents Demoing Their Work

Simon Willison:

A key challenge working with coding agents is having them both test what they've built and demonstrate that software to you, their overseer. This goes beyond automated tests—we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do.

Simon's response to this challenge is two CLIs, Showboat & Rodney.

Showboat:

It's a CLI tool (a Go binary, optionally wrapped in Python to make it easier to install) that helps an agent construct a Markdown document demonstrating exactly what their newly developed code can do.

This might be a very useful artefact to include in PRs (assuming they are supposed to be reviewed by humans of course!)

Rodney:

Rodney is a CLI tool for browser automation designed to work with Showboat. It can navigate to URLs, take screenshots, click on elements and fill in forms.

Rodney is quite interesting too. There are a few such CLIs/skills for agents to control browsers for testing out there: Vercel's agent-browser seems very popular, but there are a few others as well on skills.sh.

I'm currently using web-browser skill from mitsuhiko on GitHub, which has a set of typescript scripts that control a Chrome browser using CDP (similar to Rodney); it has no npm dependencies save for one websockets lib. This works well, but I'm going give Rodney a try because, being able to run using uvx means that it should work well in environments like Codex for web (which has uv and Chrome) without additional setup.

2026.FEB.10

Agentic Engineering

Andrej Karpathy, on the one-year anniversary of coining "vibe coding" (emphasis mine):

The one thing I'd add is that at the time, LLM capability was low enough that you'd mostly use vibe coding for fun throwaway projects, demos and explorations. It was good fun and it almost worked. Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software.

Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favorite "agentic engineering":

  • "agentic" because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight.
  • "engineering" to emphasize that there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind.

I like "agentic engineering" and I'm seeing more people use this term. I'm going to adopt this going forward. Changing the tag on my blog from "AI Coding" to "Agentic Engineering".

2026.FEB.09

Feedback Loopable

New term added to my vocab that I'm going to use a lot: "feedback loopable".

Lewis Metcalf:

Agents are most powerful when they can validate their work against reality. When they have feedback loops. The problem is, some work is hard to organize in a way that an agent can easily get feedback. The software we build and the tools we use are built for humans. Humans with eyeballs and hands and fingers.

This article is about how to make those problems easier for agents. It's a way to create an environment for your agents so that they can solve problems on their own, and so that you (the human) can intuitively guide them without getting in the way.

This process of building something for humans using methods built for agents is what I call: making it feedback loopable.

2026.FEB.09

Mitchell Hashimoto's AI Adoption Journey

This is a good post to share with anyone who is still sceptical about agentic engineering. Mitchell Hashimoto (creator of Vagrant, co-founder of HashiCorp, now building Ghostty) goes through his journey from "this isn't really helpful at all" to "it consistently adds value."

Instead of giving up, I forced myself to reproduce all my manual commits with agentic ones. I literally did the work twice. I'd do the work manually, and then I'd fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).

This was excruciating, because it got in the way of simply getting things done. But I've been around the block with non-AI tools enough to know that friction is natural, and I can't come to a firm, defensible conclusion without exhausting my efforts.

What's noteworthy is that it doesn't happen naturally for Mitchell — he had to put in explicit effort into making this work and he didn't give up when it wasn't working all too well. That may be the big difference between those who are excited about agentic engineering vs not.

2026.JAN.07

You can make up HTML tags

Browsers handle unrecognized tags by treating them as a generic element, with no effect beyond what’s specified in the CSS. This isn’t just a weird quirk, but is standardized behavior. If you include hyphens in the name, you can guarantee that your tag won’t appear in any future versions of HTML.

This is so cool, and I had never heard of this before. I wonder why this is not more popular — really semantic HTML!

2025.AUG.02

The Bitter Lesson versus The Garbage Can

A thought-provoking article that, on the surface, explores which modality of AI agent deployment is more likely to succeed in a large organisation — agents carefully designed around organisational processes, or general-purpose agents trained to seek successful outcomes (RL, for example).

But dig a little deeper, and it raises a more fundamental question: what shape will successful AI-powered products take?

Ethan Mollick:

For many people, this may not be a surprise. One thing you learn studying (or working in) organizations is that they are all actually a bit of a mess. In fact, one classic organizational theory is actually called the Garbage Can Model. This views organizations as chaotic “garbage cans” where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process.

Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.

ai
2025.APR.25

Ship Software That Does Nothing

Kerrick Long:

Many people will tell you to ship a minimum viable product. Others say to ship a prototype to get feedback. Not me. I think you should ship a blank page to your production servers on day one.

Notice how much is involved in shipping software that does nothing. This work will come around eventually. The later you do it, the riskier it is.

Not satire. Good food for thought.

2025.APR.13

AI adoption is a UX problem

Nan Yu:

These tools are casually dismissed as “GPT wrappers” by some industry commentators — after all, ChatGPT (or Sonnet or Gemini or Llama or Deepseek) is doing all the “real work”, right?

People who take this perspective seem to be throwing away all the lessons we’ve learned about software distribution. It’s like they saw Instagram and waived it off as an “ImageMagick wrapper”… or Dropbox as an “rsync wrapper”.

Those products won because they made powerful, highly technical tools accessible through thoughtful design. The biggest barrier to mass AI adoption is not capability or intelligence; we have those in spades. It’s UX.

Amen.

ai
2025.APR.06

Building Python tools with a one-shot prompt using uv run and Claude Projects

Nice and clever use of uv’s run inline dependency management and Claude Project Custom Instructions to create Python scripts that are easy to run without any setup, even while depending on Python’s rich set of libraries.

I’ve used this workflow for a few scripts in the last couple of weeks, and it works remarkably well.

You can then go a step further — add uv into the shebang line for a Python script to make it a self-contained executable.