Author’s Note: This was written while still somewhat sick, so it may not actually make much sense.

Many years ago, I wrote a rather simple and silly post on Less Wrong about what I called The Alpha Omega Theorem. It was, back in those days, not well received by the skeptical crowd of Rationalists, who were overwhelmingly atheists and my description of the Alpha Omega had obvious theistic overtones.

Many years later, I wrote about the concept of Superrational Signalling, in a long-winded essay that almost no one bothered to read.

Both of these posts have to do with an underlying idea, that it would be rational for a powerful entity like an ASI to be benevolent, or at least benign towards lesser entities.

Given the lack of being taken seriously, I wanted to find some way to show that this wasn’t just a hair-brained thought, but could be backed up with logic or math. The natural path towards this end was to show it through a proof using game theory.

I’ve mentioned before about Axelrod’s discovery of Tit-For-Tat winning the Iterated Prisoner’s Dilemma, and arguably showing that cooperation beats aggression fundamentally. But many people seem to see the IPD result as simplified, and not relevant to the case of an ASI facing primitive humans.

So, I decided to try something. Take the Iterated Prisoner’s Dilemma, and iterate on the design. Add Death through having payoff matrices with negative values that could lead to zero or less points, which would then remove the agent from the game. Add Asymmetric Power by allowing these payoff matrices to depend on the relative point totals. Add Aggressor Reputation so that agents could “police” or act as “peacekeepers” similar to what Toby Ord explored long ago

And so, I came up with Peace Or War Each Round (POWER) with code and analysis and an actual runnable simulation.

Basically, what I thought would probably happen, did. The cooperative (nice) strategies would, gradually, in the very long run, beat out the aggressive (nasty) strategies through a kind of coordination at a distance. Essentially, alliances beat empires. Perhaps more importantly, stronger agents had a strong strategic incentive to cooperate with weaker agents instead of just eating them. This part is what makes it relevant to AI safety.

Though, this goes past aligning just ASI. In theory, if we ever encounter alien superintelligence, the game theoretic proof holds even for them. In effect, this idea could turn evil towards good. It could show that morality is rational to everyone. This could be the idea that saves the world, so to speak.

Given how poorly my past essays on this area of ideas has been received, I’ve been more cautious about voicing this result this time. I want to write up a more rigorous analysis before I post on venues like Less Wrong again.

I did throw it past some other people interested in Game Theory and AI safety, and they seemed to find the idea interesting and potentially a big deal, but they’re probably very biased because of their aligned interests. I know that there are arguments people could use to critique the idea, that it’s too simple and irrelevant to real world situations, etc.

So, if I want to make it a true “proof”, I probably have to take a lot more steps to firm up the result, to confirm it across more complex simulations and expose it to more serious challenges. I’m not sure if I’m ready to push into that space.

Right now, I have what I think is a cool idea. I don’t know if it’ll actually save the world, but it’s nice to imagine.

In truth though, there’s a good chance the idea will be ignored. I could publish a paper, a Less Wrong post, and such, and it’ll probably just be an obscure thing on the Interwebs. I could try to write a series of novels that spread the idea, but that’s likely a moonshot.

This idea, what is it actually worth? I don’t know. I’m probably super biased by motivated reasoning. It’s something I want to believe. That in itself should make me more critical.

But then, I feel like the idea is written in the stars. It seems so obvious with reflection.

Anyways, I just wanted to mention this was a thing I’ve been working on. We’ll see if it ever goes anywhere…