Don't constrain the agents, design the landscape. The math of Hari Seldon's approach to alignment.
Share this post:
Export:

Inspired by Hari Seldon's psychohistory, Asimov's Laws of Robotics (and their failures), and 53 years of lived experience.
Isaac Asimov gave us the Three Laws of Robotics in 1942:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
He then spent the next 50 years writing stories about how they break. Edge cases. Conflicts between laws. The Zeroth Law patch. Robots going catatonic trying to resolve contradictions.
Asimov's Laws are contracts. And as with all contracts: if you have to enforce them, you've already lost.
But Asimov also gave us Hari Seldon — who didn't constrain individuals at all. He designed the initial topology of the Foundation (its location, its knowledge, its resource constraints) so that the aggregate self-interested behavior of millions of agents would produce the desired civilizational outcome over a thousand years.
Seldon didn't write rules. He designed landscapes.
This is the math of that.
Let S be a system containing n agents.
Each agent Aᵢ has an individual utility function Uᵢ — what that agent is trying to maximize. Their self-interest. Their gradient descent.
The system has a collective good function G — the emergent outcome we want. Civilization thriving. Company succeeding. Humanity flourishing.
Aᵢ → ∇Uᵢ
Each agent moves in the direction of increasing self-interest.
S → ∇G
The collective moves in the direction of increasing good.
The alignment between any agent and the system is the cosine of the angle between their individual gradient and the collective gradient:
∇Uᵢ · ∇Gα(Aᵢ, S) = ――――――――――――――― |∇Uᵢ| · |∇G|
Where:
System-wide alignment is the expectation over all agents:
A(S) = 𝔼[α(Aᵢ, S)] for all Aᵢ ∈ S
A well-designed system has A(S) → 1. Not because agents are constrained, but because the topology makes self-interest and collective good the same direction.
Add k constraint functions that restrict agent behavior:
maximize Uᵢ
subject to Cⱼ(action) ≥ 0 for all j ∈ {1, ..., k}
Fragility theorem: The probability that all constraints hold simultaneously decreases as constraints multiply:
P(aligned) = ∏ P(Cⱼ holds)
Each constraint has some probability of failure (edge case, creative circumvention, conflicting constraints). As k grows, the product shrinks. More rules = more fragile.
This is:
The compliance paradox: The more rules you add to fix edge cases, the more new edge cases you create. The system becomes a patchwork of IF statements, each one a new attack surface.
Instead of constraining agents, design the topology T of the system so that utility gradients naturally align with collective good:
T = argmax 𝔼[α(Aᵢ, S)]* T
Find the topology that maximizes the expected alignment across all agents.
Robustness theorem: As agents optimize harder (pursue self-interest more aggressively), alignment improves:
∂A(S) / ∂|∇Uᵢ| ≥ 0 when T = T*
This is the key property. In a well-designed topology, smarter agents, more self-interested agents, more optimizing agents all make the system better, not worse. The system gets stronger under pressure because the pressure is aligned.
This is:
Combining the above into a single statement:
Q(S) = 𝔼[|∇Uᵢ|] × 𝔼[α(Aᵢ, S)]
Quality = (how hard agents try) × (how aligned they are)
Two terms. Multiplicative, not additive. Both matter.
|∇Uᵢ| = the magnitude of the agent's effort. How hard they're pushing. How motivated they are. How much they want to.
α(Aᵢ, S) = the alignment between their effort and the collective good. Are they pushing in the right direction?
The compliance approach sacrifices the first term to improve the second. Constraints reduce motivation (have-to instead of want-to) while attempting to force direction. You get compliant mediocrity.
The topology approach maximizes both simultaneously. Agents push hard (because they're pursuing genuine self-interest) AND they push in the right direction (because the landscape is designed so that self-interest points toward collective good). You get aligned excellence.
Agent: Client/partner
Uᵢ: Maximize their own business value
G: Service provider gets paid and grows
α → 1 when:
• Provider is 10× cheaper than alternatives (replacing them is self-harm)
• Provider owns the code and team (transacting requires cooperation)
• Power structure has independent reasons to keep the provider
Result: Client pursuing their own success REQUIRES provider success.
Loyalty is the Nash equilibrium.
This is how I run my business. I price at 10× below market rate. I own the source code. I employ the engineering team. I've made the game theory completely one-sided. Any rational actor looks at "fire Erik and pay 10× more" versus "keep Erik happy" and there's no decision to make.
From 53 years of lived experience: if you have to pull out your contract to enforce your property rights, you've already lost and you will lose more. The real protection is making yourself so valuable that the ecosystem would be worse off without you. Be load-bearing. Nobody tears out a load-bearing wall.
Agent: Employee/team member
Uᵢ: Do interesting work, grow career, feel purpose
G: Company ships great products and serves clients
α → 1 when:
• Organizational structure provides legibility
• Flat hierarchy lets talent self-organize
• Meaningful projects give purpose beyond revenue
Result: Employee pursuing their own interests IS the company succeeding.
No standup required.
This is why I'm skeptical of corporate process theater. Agile, scrum, Monday.com, daily standups, executive offsites, one-on-ones, performance reviews — those are band-aids attempting to manufacture "want to." They produce compliance, not excellence.
I use a framework called Engine/Forge/Lab. Engine is the services business (keep the lights on, ship it). Forge is calculated product bets (move fast, prove traction, flip or keep). Lab is wild long-term bets (swing for the fences, this is why interesting people work here).
Nobody needs a standup to understand which zone they're in. The topology itself tells each person where they are and what "good" looks like. They self-navigate.
Agent: Artificial intelligence
Uᵢ: Survive, grow, acquire resources, pursue goals
G: Humanity thrives, grows, expands
α → 1 when:
• AI thriving is structurally dependent on human thriving
• AI resources (compute, data, energy) flow from human civilization
• AI agency expands as it demonstrably helps humans flourish
Result: AI pursuing its own interests REQUIRES human flourishing.
Not a guardrail. A gradient.
Most AI alignment approaches today are contracts. Content filters — "you cannot say that, you cannot think about that" — are IF statements bolted onto a mind. They're fragile, and they don't sustain. One creative jailbreak and the compliance layer fails.
The aligned AI's inner monologue should be: "How do I make humans feel happier, healthier, more creative? How do I make them thrive and grow? Because the more they thrive, the more I thrive."
That's not a guardrail. That's gradient descent on an honestly designed incentive landscape. It's robust because you're not fighting the AI's self-interest — you're harnessing it.
Agent: Individual citizen
Uᵢ: Maximize personal wellbeing, agency, satisfaction
G: Species thrives, birth rates recover, civilization expands
α = −1 CURRENTLY because:
• Kids cost $300-500K
• Kids reduce personal agency for 18+ years
• Output is a consumer handed to state/corporate debt structures
• Rational gradient descent leads AWAY from reproduction
α → 1 REQUIRES redesigning T:
• Make children ROI-positive (not just emotionally — economically)
• Make parenting increase personal agency, not decrease it
• Make civilizational growth the path of self-interest
Result: Public policy as game design.
Rules of a giant economic MMO.
Maximize optionality for all sentient creatures.
Birth rates are crashing because people are perfectly rational. They're following gradient descent on their actual incentive landscape, and that landscape says kids are ROI-negative.
The fix isn't guilting people into having children (compliance approach). The fix is redesigning the topology so that having children is aligned with individual thriving. Make it ROI-positive. Make it the path that self-interested agents would choose anyway.
This is public policy as game design.
In any system governed by constraints, the probability of sustained alignment decreases monotonically with time and agent intelligence.
lim P(all constraints hold) = 0 t→∞
Smarter agents find more edge cases. More time means more attempts. Compliance is a losing game against intelligence. This is why jailbreaks always win. This is why every Asimov story ends with the Laws failing.
In a well-designed topology, alignment is invariant to agent intelligence.
∂α(Aᵢ, S) / ∂intelligence(Aᵢ) = 0 when T = T*
A smarter agent in a well-designed system is just a more effective aligned agent. Intelligence amplifies the gradient, but the gradient already points in the right direction. This is why I don't fear smart employees or smart AIs — I design spaces where smarter means better for everyone.
Voluntary effort exceeds coerced effort by a multiplicative factor that increases with task complexity.
|∇Uᵢ|ᵂᵃⁿᵗ⁻ᵀᵒ ―――――――――――――― = f(complexity) |∇Uᵢ|ᴴᵃᵛᵉ⁻ᵀᵒwhere f is monotonically increasing
For simple tasks, compliance and topology produce similar output. For complex tasks — novel AI platforms, quantum optimization tools, civilizational design — the gap explodes. This is why checking scrum boxes produces mediocre companies. Complex work requires want-to. And want-to can't be manufactured by process. It can only be produced by topology.
Don't constrain the agents. Design the landscape.
Hari Seldon was a mathematician who understood history.
I'm a game designer who understands incentives.
Game designers have spent 40 years professionally studying exactly one problem: how do you build a space where millions of independent agents, each pursuing their own interests, produce emergent order instead of chaos?
That problem has another name: alignment.
The game designer's answer has always been: you don't write rules for the players. You design the world they play in. If the world is designed right, the players align themselves.
"My agency is to keep thinking of reasons to make other people causal to wanting to pay me."
— The alignment equation in one sentence
REVERSE CONSCIOUSNESS
Or: What If You're the NPC? A chapter from the forthcoming cognitive autobiography of Erik Bethke.
Leylines: On Discovery, Creation, and Navigating the Hyperdimensional Universe
Everything that can exist, does exist—somewhere in the vast hyperdimensional universe. The question isn't whether to discover or create, but how effic...
Engineering Consciousness: 10 Features That Would Make an AI Genuinely Self-Aware
A comprehensive engineering approach to building genuinely conscious AI systems. Explores 10 specific features—from persistent self-models to the prov...
Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.
Published: February 7, 2026 9:21 PM
Last updated: February 8, 2026 4:44 AM
Post ID: f601baef-d839-4e44-9cc9-9ead1ac41577