Tagged: ai

Feed

2026.MAR.14

No More Code Reviews

And — you heard it here first — we’ll one day be scared, positively petrified, to use any mission-critical software known to have allowed human interference in its codebase.

Very provocative. Put this way, it does evoke the feeling that we could very well be heading into this future.

ai programming agentic-engineering

2026.MAR.09

The Deal Is So Good

Mo Bitar:

What we do is because the deal is so damn good, we change ourselves to make that deal acceptable.

And what I've figured out now is that I'm unwilling to change myself to make that deal acceptable.

I could feel the emotions as I watched the video. Well worth the time.

ai programming work-culture

2026.FEB.18

Codex CLI vs Claude Code on Autonomy

nilenso:

I spent some time studying the system prompts of coding agent harnesses like Codex CLI and Claude Code. These prompts reveal the priorities, values, and scars of their products. They're only a few pages each and worth reading in full, especially if you use them every day. This approach to understanding such products is more grounded than the vibe-based takes you often see in feeds.

While there are many similarities and differences between them, one of the most commonly perceived differences between Claude Code and Codex CLI is autonomy, and in this post I'll share what I observed. We tend to perceive autonomous behaviour as long-running, independent, or requiring less supervision and guidance. Reading the system prompts, it becomes apparent that the products make very different, and very intentional choices.

Very interesting comparison. But I don't believe the difference in the behaviour is primarily, or even likely, driven by the system prompts. The difference is far more ingrained, most likely RL'd during post-training.

Why do I say this? I've been using both the models in Pi coding agent with its default system prompt¹, which is both really small and the same for all models. And even in Pi, this difference in behaviour comes across clearly.²

Pi allows us to replace the entire system prompt by placing a markdown file at ~/.pi/agent/SYSTEM.md ↩
I feel that the models both behave better in Pi than in their respective canonical harnesses; but this is a very subjective opinion. ↩

ai agentic-engineering

2026.FEB.16

SaaS Isn't Dead. It's Worse Than That.

Michael Bloch:

I'm more bullish on AI than I've ever been. And that's exactly why I'm bearish on most software companies. Not because their customers will leave, but because their next thirty competitors just got a lot easier to build.

I've seen/heard a bunch of different people quip exactly this. This is one of the crispest articulations. Rings ominous to me.

ai product-building

2026.FEB.16

Deep Blue

Simon Willison:

We coined a new term on the Oxide and Friends podcast last month (primary credit to Adam Leventhal) covering the sense of psychological ennui leading into existential dread that many software developers are feeling thanks to the encroachment of generative AI into their field of work.

We're calling it Deep Blue.

ai programming agentic-engineering

2025.AUG.02

The Bitter Lesson versus The Garbage Can

A thought-provoking article that, on the surface, explores which modality of AI agent deployment is more likely to succeed in a large organisation — agents carefully designed around organisational processes, or general-purpose agents trained to seek successful outcomes (RL, for example).

But dig a little deeper, and it raises a more fundamental question: what shape will successful AI-powered products take?

Ethan Mollick:

For many people, this may not be a surprise. One thing you learn studying (or working in) organizations is that they are all actually a bit of a mess. In fact, one classic organizational theory is actually called the Garbage Can Model. This views organizations as chaotic “garbage cans” where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process.

Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.

2025.APR.13

AI adoption is a UX problem

Nan Yu:

These tools are casually dismissed as “GPT wrappers” by some industry commentators — after all, ChatGPT (or Sonnet or Gemini or Llama or Deepseek) is doing all the “real work”, right?

People who take this perspective seem to be throwing away all the lessons we’ve learned about software distribution. It’s like they saw Instagram and waived it off as an “ImageMagick wrapper”… or Dropbox as an “rsync wrapper”.

Those products won because they made powerful, highly technical tools accessible through thoughtful design. The biggest barrier to mass AI adoption is not capability or intelligence; we have those in spades. It’s UX.

Amen.

2025.APR.06

Building Python tools with a one-shot prompt using uv run and Claude Projects

Nice and clever use of uv’s run inline dependency management and Claude Project Custom Instructions to create Python scripts that are easy to run without any setup, even while depending on Python’s rich set of libraries.

I’ve used this workflow for a few scripts in the last couple of weeks, and it works remarkably well.

You can then go a step further — add uv into the shebang line for a Python script to make it a self-contained executable.

ai agentic-engineering

2025.MAR.02

How to Stop Worrying and Learn to Love the Internet

Douglas Adams wrote about technology adoption in 1999, and his observations hold up remarkably well when applied to how people react to AI today.

Read Now →

ai quote

2025.JAN.26

Humanity's Last Exam

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 3,000challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

The sample questions are fun to go through, as a way of understanding the level of expertise these models are going to end up at, eventually. Eventually is the keyword there — even the best frontier models do very poorly on this benchmark right now.

Via: Installer newsletter by The Verve

Tagged: ai

↗ No More Code Reviews

↗ The Deal Is So Good

↗ Codex CLI vs Claude Code on Autonomy

Footnotes

↗ SaaS Isn't Dead. It's Worse Than That.

↗ Deep Blue

↗ The Bitter Lesson versus The Garbage Can

↗ AI adoption is a UX problem

↗ Building Python tools with a one-shot prompt using uv run and Claude Projects

💬How to Stop Worrying and Learn to Love the Internet

↗ Humanity's Last Exam