PhilipMat

A reverse signal for judging competency

Speaking of the output-competence decoupling, this is a signal that can be used in reverse to figure out what you can improve:

Planktonne: A general pattern for LLMs is that they look really good at things you are bad at. What that means is that if you find yourself thinking of its output as significantly better than yours in a particular domain, there’s a high chance that you are not equipped to judge that quality effectively.

This HN thread on Various LLM Smells has a few other choice thoughts:

pydry: Companies that don’t care about code quality always care about the side effects of poor code quality. They just can’t connect the dots.

TIL: "output-competence decoupling"

I learned of this term from a thoughtful essay from No One’s Happy that examines the flood of AI in the workplace from within, and also without.

Generative AI can produce work that looks expert without being expert, and the failure arrives in two shapes. The first is when novices in a field are able to produce work that resembles what their seniors produce, faster or more advanced than their judgment. The second is when people generate artifacts in disciplines they were never trained in. The two failures look similar from a distance and are not the same. Research has mostly measured the first. The second is what it is missing, and in my experience it is the riskier of the two.

The term for this new challenge, “output-competence decoupling” comes from a paper by Christopher Koch, called Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor:

AI assistance introduces a third variable between competence and self-assessment: observable output. In classical task settings without tools, output closely tracks competence. An essay written by a novice typically reads like a novice essay. Under AI assistance, a novice can produce output indistinguishable from an expert’s—not because their competence has risen to that level, but because the competence partially resides in the system. This output-competence decoupling is the starting point of the model developed in this paper.

The essay contains veiled anecdotes that us, in the industry, have either already encountered, or will likely run against:

  • Thinking that tool=solution=success:

[…] account directors and go-to-market leads who arrive with AI-generated projects and argue them. What they are proposing, in most cases, is a dashboard or website that displays the status of a process that is not ready to be automated, built to track a workflow that does not yet warrant tracking. The tool has not solved a problem; it has driven its user to identify a problem worth solving, outlined an architecture for the solution, and produced enough material — diagrams, schemas, interface mockups — that the user arrives in the room convinced the work is real.

  • Driving beyond competence

People who cannot write code are building software. People who have never designed a data system are designing data systems.

These can seem like positive in absence of introspection. And maybe in a lot of cases that is good enough. When it’s not:

He could not, when asked, explain how any of it actually worked. The work was wrong from the first day. The schemas, and more importantly the objectives, were wrong in a way that would have been obvious to anyone with two years in the field.

The person, in the transaction, becomes a kind of conduit, capable of routing the output to a recipient and incapable of evaluating it on the way through.

  • Performance is not learning. Doing the work is learning. This bears echoes of Thomas Edison’s observation that learning what not to build is as important as learning what to build; of “success teaches you little, failures teach a lot more”.

The skills of producing work and judging it were deliberately distinct, but accomplishing the work itself used to teach the judgment.

The architectural critique that used to come from someone who was taught, or who had built and broken three of these before now comes from a model with no embodied memory of building or breaking anything. The slowness was not a tax on the real work; the slowness was the real work. It was how the work got good, and how the people producing the work got good.

  • Volume wins - a variation of frequency bias and scents of Goodhart’s Law. The paper linked above calls it “Verbosity as False Epistemic Authority”.

Requirements documents that were once a page are now twelve. Status updates that were once three sentences are now bulleted summaries of bulleted summaries. Retrospective notes, post-incident reports, design memos, kickoff decks: every artifact that can be elongated is, by people who do not read what they produce, for readers who do not read what they receive.

In a world where we track metrics obsessively, and where volume is easier to measure and judge than quality, volume wins:

The cost of producing a document has fallen to nearly zero; the cost of reading one has not, and is in fact rising, because the reader must now sift the synthetic context for whatever the document was originally about.

I’m betting quite a few of us have encounter a situation where some person up- or side-the-chain drops a load of AI slop and walk away satisfied they have solved the problem and did most of the work too.


I like one of his recommendations at the end because it’s simple and simple rules are easy to follow.

Use the tool where you can verify precisely what it produces. Never ask a model for confirmation; the tool agrees with everyone, and an agreement that costs the agreer nothing is worth nothing.

TIL: git sparse-checkout

This is useful in having both a limited footprint from a repo, while also the ability to update.

I use this to keep a library of AI skills up-to-date, without having to copy them manually:

$ cd ~/.agents/
$ git clone --filter=blob:none --sparse https://github.com/anthropics/skills anthropics-skills
$ cd anthropics-skills
$ git sparse-checkout set skills/frontend-design
$ ln -s ~/.agents/anthropics-skills/skills/frontend-design ~/.agents/skills/frontend-design
  • --filter=blob:none
    Tells Git to clone without downloading any file contents (blobs) upfront. You get the full commit history and directory tree metadata, but actual file data is fetched lazily — only when you check out or access a file. This makes the initial clone much faster for large repos.
  • --sparse
    Enables sparse checkout mode, which means Git will only populate your working directory with a subset of files instead of everything in the repo.
  • git sparse-checkout set skills/frontend-design
    Configures which paths to materialize in the working directory. After this command, only the skills/frontend-design/ directory will appear on disk. Everything else exists in the Git object store but won’t be written to your filesystem.

Later on, git pull behaves like a normal pull but scoped to what has been materialized.

Techcrunch: Tech CEOs are apparently suffering from AI psychosis

There have been multiple reports of similar … “enthusiasm”, and this this Techcrunch article nails it:

CEOs are uniquely prone to AI psychosis because they’re sufficiently distant from the last mile of work that still has to happen to generate most value with AI.

Per Aaron Levie of Box.com:

CEOs “play with AI,” develop a prototype, or generate a contract, to use Levie’s examples, and then make the leap to believing agents can do the work.

But these top-level executives aren’t the people who have to review code, discover bugs, and identify calls to hallucinated libraries before software is deployed. They aren’t responsible for training AI models on a company’s idiosyncratic contract terms, nor do they have to spend days combing through contracts to find sneaky terms, as Levie indicates.

In other words, Levie’s theory posits, CEOs don’t really understand processes well enough to know what really can and can’t be automated. But that lack of knowledge doesn’t stop them from acting on their beliefs.

Derek Sivers: Geography is four-dimensional

Thoughtful post from Derek Sivers arguing that time is a component of a place:

When someone speaks of a place, you have to ask, “When?” Geography is four-dimensional. You can’t know a place - only a place as it was at a time. Where is bound to when. Unless you are in a place right now, you can only speak of it in past-tense.

Any 3D place that changes with time is needs that fourth, time dimension to express it. And a place humans inhabit is always changing.