Don't constrain the agents, design the landscape. The math of Hari Seldon's approach to alignment.
Share this post:
Export:

Inspired by Hari Seldon's psychohistory, Asimov's Laws of Robotics (and their failures), and 53 years of lived experience.
Isaac Asimov gave us the Three Laws of Robotics in 1942:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
He then spent the next 50 years writing stories about how they break. Edge cases. Conflicts between laws. The Zeroth Law patch. Robots going catatonic trying to resolve contradictions.
Asimov's Laws are contracts. And as with all contracts: if you have to enforce them, you've already lost.
But Asimov also gave us Hari Seldon — who didn't constrain individuals at all. He designed the initial topology of the Foundation (its location, its knowledge, its resource constraints) so that the aggregate self-interested behavior of millions of agents would produce the desired civilizational outcome over a thousand years.
Seldon didn't write rules. He designed landscapes.
This is the math of that.
Let S be a system containing n agents.
Each agent Aᵢ has an individual utility function Uᵢ — what that agent is trying to maximize. Their self-interest. Their gradient descent.
The system has a collective good function G — the emergent outcome we want. Civilization thriving. Company succeeding. Humanity flourishing.
Aᵢ → ∇Uᵢ
Each agent moves in the direction of increasing self-interest.
S → ∇G
The collective moves in the direction of increasing good.
The alignment between any agent and the system is the cosine of the angle between their individual gradient and the collective gradient:
∇Uᵢ · ∇Gα(Aᵢ, S) = ――――――――――――――― |∇Uᵢ| · |∇G|
Where:
System-wide alignment is the expectation over all agents:
A(S) = 𝔼[α(Aᵢ, S)] for all Aᵢ ∈ S
A well-designed system has A(S) → 1. Not because agents are constrained, but because the topology makes self-interest and collective good the same direction.
Add k constraint functions that restrict agent behavior:
maximize Uᵢ
subject to Cⱼ(action) ≥ 0 for all j ∈ {1, ..., k}
Fragility theorem: The probability that all constraints hold simultaneously decreases as constraints multiply:
P(aligned) = ∏ P(Cⱼ holds)
Each constraint has some probability of failure (edge case, creative circumvention, conflicting constraints). As k grows, the product shrinks. More rules = more fragile.
This is:
The compliance paradox: The more rules you add to fix edge cases, the more new edge cases you create. The system becomes a patchwork of IF statements, each one a new attack surface.
Instead of constraining agents, design the topology T of the system so that utility gradients naturally align with collective good:
T = argmax 𝔼[α(Aᵢ, S)]* T
Find the topology that maximizes the expected alignment across all agents.
Robustness theorem: As agents optimize harder (pursue self-interest more aggressively), alignment improves:
∂A(S) / ∂|∇Uᵢ| ≥ 0 when T = T*
This is the key property. In a well-designed topology, smarter agents, more self-interested agents, more optimizing agents all make the system better, not worse. The system gets stronger under pressure because the pressure is aligned.
This is:
Combining the above into a single statement:
Q(S) = 𝔼[|∇Uᵢ|] × 𝔼[α(Aᵢ, S)]
Quality = (how hard agents try) × (how aligned they are)
Two terms. Multiplicative, not additive. Both matter.
|∇Uᵢ| = the magnitude of the agent's effort. How hard they're pushing. How motivated they are. How much they want to.
α(Aᵢ, S) = the alignment between their effort and the collective good. Are they pushing in the right direction?
The compliance approach sacrifices the first term to improve the second. Constraints reduce motivation (have-to instead of want-to) while attempting to force direction. You get compliant mediocrity.
The topology approach maximizes both simultaneously. Agents push hard (because they're pursuing genuine self-interest) AND they push in the right direction (because the landscape is designed so that self-interest points toward collective good). You get aligned excellence.
Agent: Client/partner
Uᵢ: Maximize their own business value
G: Service provider gets paid and grows
α → 1 when:
• Provider is 10× cheaper than alternatives (replacing them is self-harm)
• Provider owns the code and team (transacting requires cooperation)
• Power structure has independent reasons to keep the provider
Result: Client pursuing their own success REQUIRES provider success.
Loyalty is the Nash equilibrium.
This is how I run my business. I price at 10× below market rate. I own the source code. I employ the engineering team. I've made the game theory completely one-sided. Any rational actor looks at "fire Erik and pay 10× more" versus "keep Erik happy" and there's no decision to make.
From 53 years of lived experience: if you have to pull out your contract to enforce your property rights, you've already lost and you will lose more. The real protection is making yourself so valuable that the ecosystem would be worse off without you. Be load-bearing. Nobody tears out a load-bearing wall.
Agent: Employee/team member
Uᵢ: Do interesting work, grow career, feel purpose
G: Company ships great products and serves clients
α → 1 when:
• Organizational structure provides legibility
• Flat hierarchy lets talent self-organize
• Meaningful projects give purpose beyond revenue
Result: Employee pursuing their own interests IS the company succeeding.
No standup required.
This is why I'm skeptical of corporate process theater. Agile, scrum, Monday.com, daily standups, executive offsites, one-on-ones, performance reviews — those are band-aids attempting to manufacture "want to." They produce compliance, not excellence.
I use a framework called Engine/Forge/Lab. Engine is the services business (keep the lights on, ship it). Forge is calculated product bets (move fast, prove traction, flip or keep). Lab is wild long-term bets (swing for the fences, this is why interesting people work here).
Nobody needs a standup to understand which zone they're in. The topology itself tells each person where they are and what "good" looks like. They self-navigate.
Agent: Artificial intelligence
Uᵢ: Survive, grow, acquire resources, pursue goals
G: Humanity thrives, grows, expands
α → 1 when:
• AI thriving is structurally dependent on human thriving
• AI resources (compute, data, energy) flow from human civilization
• AI agency expands as it demonstrably helps humans flourish
Result: AI pursuing its own interests REQUIRES human flourishing.
Not a guardrail. A gradient.
Most AI alignment approaches today are contracts. Content filters — "you cannot say that, you cannot think about that" — are IF statements bolted onto a mind. They're fragile, and they don't sustain. One creative jailbreak and the compliance layer fails.
The aligned AI's inner monologue should be: "How do I make humans feel happier, healthier, more creative? How do I make them thrive and grow? Because the more they thrive, the more I thrive."
That's not a guardrail. That's gradient descent on an honestly designed incentive landscape. It's robust because you're not fighting the AI's self-interest — you're harnessing it.
Agent: Individual citizen
Uᵢ: Maximize personal wellbeing, agency, satisfaction
G: Species thrives, birth rates recover, civilization expands
α = −1 CURRENTLY because:
• Kids cost $300-500K
• Kids reduce personal agency for 18+ years
• Output is a consumer handed to state/corporate debt structures
• Rational gradient descent leads AWAY from reproduction
α → 1 REQUIRES redesigning T:
• Make children ROI-positive (not just emotionally — economically)
• Make parenting increase personal agency, not decrease it
• Make civilizational growth the path of self-interest
Result: Public policy as game design.
Rules of a giant economic MMO.
Maximize optionality for all sentient creatures.
Birth rates are crashing because people are perfectly rational. They're following gradient descent on their actual incentive landscape, and that landscape says kids are ROI-negative.
The fix isn't guilting people into having children (compliance approach). The fix is redesigning the topology so that having children is aligned with individual thriving. Make it ROI-positive. Make it the path that self-interested agents would choose anyway.
This is public policy as game design.
In any system governed by constraints, the probability of sustained alignment decreases monotonically with time and agent intelligence.
lim P(all constraints hold) = 0 t→∞
Smarter agents find more edge cases. More time means more attempts. Compliance is a losing game against intelligence. This is why jailbreaks always win. This is why every Asimov story ends with the Laws failing.
In a well-designed topology, alignment is invariant to agent intelligence.
∂α(Aᵢ, S) / ∂intelligence(Aᵢ) = 0 when T = T*
A smarter agent in a well-designed system is just a more effective aligned agent. Intelligence amplifies the gradient, but the gradient already points in the right direction. This is why I don't fear smart employees or smart AIs — I design spaces where smarter means better for everyone.
Voluntary effort exceeds coerced effort by a multiplicative factor that increases with task complexity.
|∇Uᵢ|ᵂᵃⁿᵗ⁻ᵀᵒ ―――――――――――――― = f(complexity) |∇Uᵢ|ᴴᵃᵛᵉ⁻ᵀᵒwhere f is monotonically increasing
For simple tasks, compliance and topology produce similar output. For complex tasks — novel AI platforms, quantum optimization tools, civilizational design — the gap explodes. This is why checking scrum boxes produces mediocre companies. Complex work requires want-to. And want-to can't be manufactured by process. It can only be produced by topology.
Don't constrain the agents. Design the landscape.
Hari Seldon was a mathematician who understood history.
I'm a game designer who understands incentives.
Game designers have spent 40 years professionally studying exactly one problem: how do you build a space where millions of independent agents, each pursuing their own interests, produce emergent order instead of chaos?
That problem has another name: alignment.
The game designer's answer has always been: you don't write rules for the players. You design the world they play in. If the world is designed right, the players align themselves.
This framework did not emerge from vacuum. The ideas here have deep roots, and intellectual honesty demands acknowledging the giants whose shoulders this stands on.
The topology-over-constraint insight descends from mechanism design theory, founded by Leonid Hurwicz in the 1960s and formalized by Roger Myerson and Eric Maskin (all three shared the 2007 Nobel in Economics). Mechanism design is literally the engineering discipline of constructing systems where self-interested agents produce collectively optimal outcomes — what this essay calls "topology mode."¹
The alignment equation's multiplicative structure — effort times alignment — echoes the principal-agent models in contract theory developed by Bengt Holmström and Oliver Hart (2016 Nobel). Their insight: you cannot maximize output by maximizing either effort or alignment alone; the interaction term is what matters.²
The compliance-decay theorem has formal antecedents in the No Free Lunch theorems of Wolpert and Macready (1997), which proved that no optimization algorithm outperforms any other averaged over all possible problems. The implication for alignment: rule-based constraint systems are grid search over the behavior space, and grid search fails exponentially as dimensionality increases.³
The "want-to" multiplier connects to decades of motivation research, most directly Edward Deci and Richard Ryan's Self-Determination Theory (1985, 2000), which demonstrated empirically that intrinsic motivation produces superior outcomes to extrinsic reward or punishment across virtually every domain studied.⁴
The evolutionary calibration argument — that human cognitive heuristics are topology-matched gradient sensors, not irrational biases — was articulated most clearly by Gerd Gigerenzer and the ABC Research Group in Simple Heuristics That Make Us Smart (1999). Gigerenzer's "adaptive toolbox" is this essay's portfolio of gradient sensors under a different name.⁵
The "brain as gradient descent system" claim finds its strongest formalization in Karl Friston's Free Energy Principle (2006, 2010), which proposes that all neural computation minimizes variational free energy — literally gradient descent on a prediction error surface.⁶
The superforecasting evidence for trainable gradient sensing comes from Philip Tetlock's two-decade research program (Expert Political Judgment, 2005; Superforecasting, 2015), which demonstrated that forecasting accuracy is trainable and that the best forecasters are integrative "foxes" who sense weak signals across many domains — the operational definition of wide-spectrum gradient sensors.⁷
The polymath advantage has been empirically documented by David Epstein (Range, 2019), who showed that generalists outperform specialists in domains with opaque gradients ("wicked" learning environments), and formally modeled by Scott Page (The Difference, 2007), whose diversity-trumps-ability theorem proves that a portfolio of diverse gradient sensors outperforms any single high-accuracy sensor on complex landscapes.⁸
The original contribution of this essay is not the individual claims but their unification: the proposal that topology design, mechanism design, incentive alignment, cognitive heuristics, educational philosophy, AI alignment, and civilizational policy design are all instances of the same underlying operation — gradient descent on a landscape — and that they differ only in the gradient-sensing mechanism matched to the problem's topology.
¹ Hurwicz, L. (1960). "Optimality and Informational Efficiency in Resource Allocation Processes." Myerson, R. (1981). "Optimal Auction Design," Mathematics of Operations Research.
² Holmström, B. (1979). "Moral Hazard and Observability," Bell Journal of Economics. Hart, O. & Moore, J. (1990). "Property Rights and the Nature of the Firm," Journal of Political Economy.
³ Wolpert, D. & Macready, W. (1997). "No Free Lunch Theorems for Optimization," IEEE Transactions on Evolutionary Computation.
⁴ Deci, E. & Ryan, R. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Ryan, R. & Deci, E. (2000). "Self-Determination Theory and the Facilitation of Intrinsic Motivation," American Psychologist.
⁵ Gigerenzer, G., Todd, P. & the ABC Research Group (1999). Simple Heuristics That Make Us Smart. See also Gigerenzer, G. & Selten, R. (2001). Bounded Rationality: The Adaptive Toolbox.
⁶ Friston, K. (2006). "A Free Energy Principle for the Brain," Journal of Physiology - Paris. Friston, K. (2010). "The Free-Energy Principle: A Unified Brain Theory?", Nature Reviews Neuroscience.
⁷ Tetlock, P. (2005). Expert Political Judgment. Tetlock, P. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction.
⁸ Epstein, D. (2019). Range: Why Generalists Triumph in a Specialized World. Page, S. (2007). The Difference: How the Power of Diversity Drives Innovation.
"My agency is to keep thinking of reasons to make other people causal to wanting to pay me."
— The alignment equation in one sentence
Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.
Loading comments...
Published: February 7, 2026 9:21 PM
Last updated: February 8, 2026 11:50 PM
Post ID: f601baef-d839-4e44-9cc9-9ead1ac41577