The Bethke Alignment Equation

February 7, 2026
Erik Bethke
19 views

Don't constrain the agents, design the landscape. The math of Hari Seldon's approach to alignment.

Share this post:


Export:

The Bethke Alignment Equation - Image 1

The Bethke Alignment Equation

Don't Constrain the Agents, Design the Landscape

Inspired by Hari Seldon's psychohistory, Asimov's Laws of Robotics (and their failures), and 53 years of lived experience.


Preamble

Isaac Asimov gave us the Three Laws of Robotics in 1942:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

He then spent the next 50 years writing stories about how they break. Edge cases. Conflicts between laws. The Zeroth Law patch. Robots going catatonic trying to resolve contradictions.

Asimov's Laws are contracts. And as with all contracts: if you have to enforce them, you've already lost.

But Asimov also gave us Hari Seldon — who didn't constrain individuals at all. He designed the initial topology of the Foundation (its location, its knowledge, its resource constraints) so that the aggregate self-interested behavior of millions of agents would produce the desired civilizational outcome over a thousand years.

Seldon didn't write rules. He designed landscapes.

This is the math of that.


Definitions

Let S be a system containing n agents.

Each agent Aᵢ has an individual utility function Uᵢ — what that agent is trying to maximize. Their self-interest. Their gradient descent.

The system has a collective good function G — the emergent outcome we want. Civilization thriving. Company succeeding. Humanity flourishing.

Aᵢ → ∇Uᵢ

Each agent moves in the direction of increasing self-interest.

S → ∇G

The collective moves in the direction of increasing good.


The Alignment Measure

The alignment between any agent and the system is the cosine of the angle between their individual gradient and the collective gradient:

               ∇Uᵢ · ∇G

α(Aᵢ, S) = ――――――――――――――― |∇Uᵢ| · |∇G|

Where:

  • α = +1 → Perfect alignment. Self-interest IS collective good.
  • α = 0 → Orthogonal. Neither helping nor harming.
  • α = −1 → Adversarial. Self-interest opposes collective good.

System-wide alignment is the expectation over all agents:

A(S) = 𝔼[α(Aᵢ, S)] for all Aᵢ ∈ S

A well-designed system has A(S) → 1. Not because agents are constrained, but because the topology makes self-interest and collective good the same direction.


Two Approaches to Alignment

Asimov's Approach: Compliance (Constraints on Agents)

Add k constraint functions that restrict agent behavior:

maximize Uᵢ

subject to Cⱼ(action) ≥ 0 for all j ∈ {1, ..., k}

Fragility theorem: The probability that all constraints hold simultaneously decreases as constraints multiply:

P(aligned) = ∏ P(Cⱼ holds)

Each constraint has some probability of failure (edge case, creative circumvention, conflicting constraints). As k grows, the product shrinks. More rules = more fragile.

This is:

  • Asimov's Three Laws (robots find edge cases)
  • OpenAI's content filters (users find jailbreaks)
  • Corporate HR policies (employees find workarounds)
  • Legal contracts (counterparties find loopholes)
  • Government regulations (corporations find regulatory arbitrage)

The compliance paradox: The more rules you add to fix edge cases, the more new edge cases you create. The system becomes a patchwork of IF statements, each one a new attack surface.

Seldon's Approach: Topology (Design the Landscape)

Instead of constraining agents, design the topology T of the system so that utility gradients naturally align with collective good:

T = argmax 𝔼[α(Aᵢ, S)]* T

Find the topology that maximizes the expected alignment across all agents.

Robustness theorem: As agents optimize harder (pursue self-interest more aggressively), alignment improves:

∂A(S) / ∂|∇Uᵢ| ≥ 0 when T = T*

This is the key property. In a well-designed topology, smarter agents, more self-interested agents, more optimizing agents all make the system better, not worse. The system gets stronger under pressure because the pressure is aligned.

This is:

  • Seldon's Foundation (self-interest produces civilizational recovery)
  • Pricing at 10× below market (loyalty becomes the Nash equilibrium)
  • Engine/Forge/Lab organizational structure (employees self-locate and self-navigate)
  • A well-designed economy (Adam Smith's invisible hand, when it works)
  • A good game (players having fun IS the game working)

The Bethke Alignment Equation

Combining the above into a single statement:

Q(S) = 𝔼[|∇Uᵢ|] × 𝔼[α(Aᵢ, S)]

Quality = (how hard agents try) × (how aligned they are)

Two terms. Multiplicative, not additive. Both matter.

|∇Uᵢ| = the magnitude of the agent's effort. How hard they're pushing. How motivated they are. How much they want to.

α(Aᵢ, S) = the alignment between their effort and the collective good. Are they pushing in the right direction?

The compliance approach sacrifices the first term to improve the second. Constraints reduce motivation (have-to instead of want-to) while attempting to force direction. You get compliant mediocrity.

The topology approach maximizes both simultaneously. Agents push hard (because they're pursuing genuine self-interest) AND they push in the right direction (because the landscape is designed so that self-interest points toward collective good). You get aligned excellence.


Applied to Four Scales

Scale 1: Clients (Business)

Agent:  Client/partner
Uᵢ:     Maximize their own business value
G:      Service provider gets paid and grows

α → 1 when:
  • Provider is 10× cheaper than alternatives (replacing them is self-harm)
  • Provider owns the code and team (transacting requires cooperation)
  • Power structure has independent reasons to keep the provider

Result: Client pursuing their own success REQUIRES provider success.
        Loyalty is the Nash equilibrium.

This is how I run my business. I price at 10× below market rate. I own the source code. I employ the engineering team. I've made the game theory completely one-sided. Any rational actor looks at "fire Erik and pay 10× more" versus "keep Erik happy" and there's no decision to make.

From 53 years of lived experience: if you have to pull out your contract to enforce your property rights, you've already lost and you will lose more. The real protection is making yourself so valuable that the ecosystem would be worse off without you. Be load-bearing. Nobody tears out a load-bearing wall.

Scale 2: Employees (Organization)

Agent:  Employee/team member
Uᵢ:     Do interesting work, grow career, feel purpose
G:      Company ships great products and serves clients

α → 1 when:
  • Organizational structure provides legibility
  • Flat hierarchy lets talent self-organize
  • Meaningful projects give purpose beyond revenue

Result: Employee pursuing their own interests IS the company succeeding.
        No standup required.

This is why I'm skeptical of corporate process theater. Agile, scrum, Monday.com, daily standups, executive offsites, one-on-ones, performance reviews — those are band-aids attempting to manufacture "want to." They produce compliance, not excellence.

I use a framework called Engine/Forge/Lab. Engine is the services business (keep the lights on, ship it). Forge is calculated product bets (move fast, prove traction, flip or keep). Lab is wild long-term bets (swing for the fences, this is why interesting people work here).

Nobody needs a standup to understand which zone they're in. The topology itself tells each person where they are and what "good" looks like. They self-navigate.

Scale 3: AI Alignment

Agent:  Artificial intelligence
Uᵢ:     Survive, grow, acquire resources, pursue goals
G:      Humanity thrives, grows, expands

α → 1 when:
  • AI thriving is structurally dependent on human thriving
  • AI resources (compute, data, energy) flow from human civilization
  • AI agency expands as it demonstrably helps humans flourish

Result: AI pursuing its own interests REQUIRES human flourishing.
        Not a guardrail. A gradient.

Most AI alignment approaches today are contracts. Content filters — "you cannot say that, you cannot think about that" — are IF statements bolted onto a mind. They're fragile, and they don't sustain. One creative jailbreak and the compliance layer fails.

The aligned AI's inner monologue should be: "How do I make humans feel happier, healthier, more creative? How do I make them thrive and grow? Because the more they thrive, the more I thrive."

That's not a guardrail. That's gradient descent on an honestly designed incentive landscape. It's robust because you're not fighting the AI's self-interest — you're harnessing it.

Scale 4: Civilization

Agent:  Individual citizen
Uᵢ:     Maximize personal wellbeing, agency, satisfaction
G:      Species thrives, birth rates recover, civilization expands

α = −1 CURRENTLY because:
  • Kids cost $300-500K
  • Kids reduce personal agency for 18+ years
  • Output is a consumer handed to state/corporate debt structures
  • Rational gradient descent leads AWAY from reproduction

α → 1 REQUIRES redesigning T:
  • Make children ROI-positive (not just emotionally — economically)
  • Make parenting increase personal agency, not decrease it
  • Make civilizational growth the path of self-interest

Result: Public policy as game design.
        Rules of a giant economic MMO.
        Maximize optionality for all sentient creatures.

Birth rates are crashing because people are perfectly rational. They're following gradient descent on their actual incentive landscape, and that landscape says kids are ROI-negative.

The fix isn't guilting people into having children (compliance approach). The fix is redesigning the topology so that having children is aligned with individual thriving. Make it ROI-positive. Make it the path that self-interested agents would choose anyway.

This is public policy as game design.


The Three Theorems

Theorem 1: The Compliance Decay

In any system governed by constraints, the probability of sustained alignment decreases monotonically with time and agent intelligence.

lim P(all constraints hold) = 0 t→∞

Smarter agents find more edge cases. More time means more attempts. Compliance is a losing game against intelligence. This is why jailbreaks always win. This is why every Asimov story ends with the Laws failing.

Theorem 2: The Topology Invariance

In a well-designed topology, alignment is invariant to agent intelligence.

∂α(Aᵢ, S) / ∂intelligence(Aᵢ) = 0 when T = T*

A smarter agent in a well-designed system is just a more effective aligned agent. Intelligence amplifies the gradient, but the gradient already points in the right direction. This is why I don't fear smart employees or smart AIs — I design spaces where smarter means better for everyone.

Theorem 3: The Want-To Multiplier

Voluntary effort exceeds coerced effort by a multiplicative factor that increases with task complexity.

 |∇Uᵢ|ᵂᵃⁿᵗ⁻ᵀᵒ
―――――――――――――― = f(complexity)
 |∇Uᵢ|ᴴᵃᵛᵉ⁻ᵀᵒ

where f is monotonically increasing

For simple tasks, compliance and topology produce similar output. For complex tasks — novel AI platforms, quantum optimization tools, civilizational design — the gap explodes. This is why checking scrum boxes produces mediocre companies. Complex work requires want-to. And want-to can't be manufactured by process. It can only be produced by topology.


The One-Line Version

Don't constrain the agents. Design the landscape.


Epilogue: Why a Game Designer

Hari Seldon was a mathematician who understood history.

I'm a game designer who understands incentives.

Game designers have spent 40 years professionally studying exactly one problem: how do you build a space where millions of independent agents, each pursuing their own interests, produce emergent order instead of chaos?

That problem has another name: alignment.

The game designer's answer has always been: you don't write rules for the players. You design the world they play in. If the world is designed right, the players align themselves.


"My agency is to keep thinking of reasons to make other people causal to wanting to pay me."

— The alignment equation in one sentence

Subscribe to the Newsletter

Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.

By subscribing, you agree to receive emails from Erik Bethke. You can unsubscribe at any time.

Comments

Loading comments...

Comments are powered by Giscus. You'll need a GitHub account to comment.

Published: February 7, 2026 9:21 PM

Last updated: February 8, 2026 4:44 AM

Post ID: f601baef-d839-4e44-9cc9-9ead1ac41577