This is a dynamic post stored in the cloud and can be updated in real-time.

Distillation Attacks on the Universe

June 28, 2026

Erik Bethke

6 views

AI AGI intelligence neuroscience philosophy

Why a 20-watt brain out-learns a megawatt model — and what it stole from a billion-year pretraining run. The capstone of a trilogy on eyeballs, leylines, and cheap intelligence.

1,906 words · 10 min read

Share this post:

Export:

Distillation Attacks on the Universe - Image 1

Why a 20-watt brain out-learns a megawatt model — and what it stole from a billion-year pretraining run. The third of four.

In the first of these essays I argued that a state-of-the-art model plus eyeballs and tools is practical AGI — general execution, bounded only by what you can instrument. In the second I chased what imagination actually is: a search across a high-dimensional space along ridges I called leylines, steered by a taste function we don't yet know how to build.

This one is about the most embarrassing benchmark in artificial intelligence. It runs on about twenty watts — roughly a dim lightbulb — it fits in a skull, it never saw the open internet, and it still learns some things from two or three examples that a megawatt training run needs millions to approximate. It's you. And once you understand why it's so cheap, you can see where machine intelligence has to go next.

Humans are not data-poor. We are prior-rich.

The romantic claim — "humans learn from almost nothing" — is false, and the first sharp reader will say so. A child needs to see only two giraffes to recognize every giraffe afterward. True. But the child is not learning giraffes from two examples. The child is fine-tuning a visual cortex that a billion years of evolution already shaped against the structure of this universe — edges, surfaces, animals, agents, intent.

So here is the reframe that makes the whole thing click: evolution is the pretraining run. Astronomical data. Astronomical energy. Billions of years of trial and death, distilling the regularities of reality into priors, instincts, and a cortex pre-wired to the shape of the world. The individual human is then a wildly sample-efficient fine-tuner — few-shot precisely because the many-shot bill was already paid, upstream, in the genome. "Data-poor" is an illusion of scope. Zoom out to the species and we are data-rich beyond anything in a datacenter.

Which means we didn't invent a strange new kind of intelligence with the machines. We rebuilt the only architecture ever known to work — a vast expensive pretraining that distills the world into priors, followed by cheap few-shot adaptation at the edge — and we compressed it from geologic time into a GPU-month. Pretraining is evolution. In-context learning is the lifetime. The two stories rhyme so hard it should make you sit up.

The distillation attack

Here is the verb I can't stop using: we are constantly running distillation attacks on the universe.

In machine learning, distillation is when a small "student" model learns to reproduce the behavior of a giant "teacher" by matching its outputs — ending up orders of magnitude smaller than the thing it imitates. That is exactly what a mind does to reality. Reality is the teacher model — astronomically high-dimensional, more than any brain could ever store. Perception and experiment are how we query it. And we walk away with a student model so small it's almost insulting: F = ma, three symbols standing in for uncountable observations of every falling, sliding, colliding thing that has ever existed.

A master craftsman's "feel" is a distilled controller running in the cord of his arm. An expert's intuition is a compression of ten thousand cases into a single fast verdict he can't fully explain. And science is just institutionalized, multi-generational distillation — a civilization-scale effort to compress the firehose into laws short enough to teach. Language and culture are the codec and the inherited weights: each generation is handed the student model and skips re-deriving fire from scratch.

There's real theory under this if you want it. Friston's free-energy principle says the brain is fundamentally a machine for minimizing surprise — which is to say, for compressing the world well enough to predict it. Minimum-description-length and the old Occam instinct say intelligence is, at bottom, the search for the shortest program that explains the data. The leylines from the last essay are the low-complexity structure hiding in the noise. Attunement is a hardware bias toward short descriptions — a nose for the compression that was always available.

Perception is already the attack

Start with your eyes, right now. They are pouring something like ten megabits a second up the optic nerve — and that is after the retina has already thrown away most of the photons that hit it, running edge- and motion-detection before the signal ever leaves the eyeball. By the time anything reaches conscious awareness, you are keeping perhaps a few tens of bits a second. Somewhere between the light and the thought, by some estimates, seven orders of magnitude are discarded.

Sit with what that means. Seeing is not capturing. Seeing is destroying — keeping the one ridge that matters and annihilating the rest. The eyeball from the first essay was never a camera; it is a distillation engine whose genius is precisely what it refuses to keep. Attention is a delete key. The first and most violent distillation attack happens before you are even aware there was data to attack.

We sing to keep the ridges

So what do we do with the little we keep? We rehearse it, we prune it, and we pass it on — and almost every distinctly human thing is one of those three.

We sleep, and the brain runs its nightly compression pass: downscaling the synapses that fired on noise, replaying and consolidating the ones that fired on signal. Dreaming is the offline distillation run. We sing and dance and tell stories, because rhythm and rhyme and narrative are compression codecs with error-correction baked in — how an oral species kept its student-model alive across generations with no hard drive but each other. And we joke. A joke is a compression handshake: it only lands if two minds already share the ridge and can leap the gap unaided — laughter is the receipt that the manifolds matched. Culture is just distributed distillation with social error-correction running on top.

Why it's cheap: you walk the ridge, you don't search the volume

Now the part that answers the twenty watts.

A high-dimensional space is almost entirely empty. If intelligence meant searching that volume densely, the energy bill would be astronomical — and that is roughly what a frontier model pays, with its oceans of dense multiplication across the whole space. But if you are pre-tuned to the ridges — if your priors already know where the meaningful structure lives — you never compute the void. You walk the thin manifold where meaning actually is, and you skip the rest.

Leyline-attunement isn't a side effect of efficient intelligence. It is the efficiency. The reason a brain runs on a lightbulb's worth of power is that it almost never does brute-force search; it follows ridges it was born already knowing. Biology has been voting on this for a billion years — analog, sparse, event-driven, spending energy only where the signal is. The lesson for machine intelligence is not subtle: the road to low-energy AI is not a bigger dense model. It is ridge-following — sparse, structured computation that spends flops only along the manifold, the way the meat does.

It is expensive to eat

And if you ask why evolution turned fanatical about walking ridges instead of searching volumes, you hit the prime mover under everything else: it is expensive to eat. Your brain is about two percent of your body mass and burns a fifth of your fuel. Every spike is paid in glucose; glucose is paid in foraging; foraging is paid in time and risk and the occasional predator. A mind that tried to store and compute everything would have starved its owner before it ever got clever.

So the ruthless discard isn't tidiness — it is metabolism. The twenty-watt cap is not a fun fact about the brain; it is the optimization constraint that produced the brain. Leyline-attunement, the violence of perception, the nightly pruning, the offloading into song — every one of them is downstream of a single ancient fact: calories were scarce, and thinking was not free.

The same gift is the same curse

Keep this honest, because the honesty is the most interesting part: we are only brilliant on the ancestral manifold.

Faces, social dynamics, the arc of a story, the physics of a thrown rock, the music of a sentence — here we are miraculous, near-instant, almost free. Step off that manifold — high-dimensional statistics, exponential growth, quantum mechanics, base rates, anything our ancestors never had to survive — and we are slow, clumsy, and wrong in patterned, predictable ways.

And the kicker: it is the same mechanism. Cognitive biases are not a separate bug list bolted onto an otherwise clean reasoner. They are leyline-priors firing with total confidence on a manifold they were never tuned for. The base-rate fallacy, the gambler's fallacy, our hopelessness with compound interest — these are ridge-followers confidently walking a ridge that isn't there. Human genius and human bias are one phenomenon seen from two angles: a sensor exquisitely matched to one manifold, hallucinating structure when you point it at another.

The pairing is the only general intelligence in the room

Which closes the loop on the machinery.

A machine will burn megawatts to brute-force a space until it finds a ridge no human can sense — AlphaGo's move 37, a protein fold, a material that shouldn't exist. A human will walk a familiar ridge for twenty watts and a glance, and go utterly blind the instant she steps off it. Neither one is general. One is cheap and blinkered; the other is expensive and unmoored. The generality everyone keeps reaching for doesn't live in either party — it lives in the pairing: cheap human priors covering the machine's blind spots, expensive machine search covering ours.

And here is the part that should make every datacenter nervous. The machines have lived, so far, under the opposite constraint: energy cheap, energy abundant — so they hoard, they brute-force, they scale, profligate because they are rich. That era is ending. Power is becoming the binding constraint on frontier AI; the bottleneck is turning from data to watts. And the moment energy is the cap, the machines will be forced toward the same elegance the meat found under a billion years of metabolic poverty: walk the ridge, discard the volume, sleep to prune, share to error-correct. The twenty-watt brain isn't a relic to surpass. It's a preview of what energy-constrained intelligence has to become.

That is the arc of the machinery. Give intelligence eyeballs and it can act. Point it at the leylines and it can imagine. And the cheapest, oldest, most ruthless distillation engine we know of — the one reading these words on twenty watts — already proved the trick is real. We've spent a billion years prosecuting distillation attacks on the universe and writing the loot into our children. The machines just started running the same attack, much louder, on the ridges we were never built to see.

The interesting future isn't the meat or the megawatts. It's what they distill together.

And yet I have dodged the strangest question of the whole series — not how a mind distills the universe, but why it would ever want to. That is the last essay.

A four-part series on intelligence:

1. Give the Model Eyeballs
2. Leylines
3. Distillation Attacks on the Universe (you are here)
4. Beauty Is the Reward

Subscribe to the Newsletter

Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.