An eccentric dreamer in search of truth and happiness for all.

Category: Rationality

In Pursuit of Practical Ethics: Eudaimonic Utilitarianism with Kantian Priors

(2024/01/08): Posted to the EA Forums.

Disclaimer: I am not a professional moral philosopher. I have only taken a number of philosophy courses in undergrad, read a bunch of books, and thought a lot about these questions. My ideas are just ideas, and I don’t claim at all to have discovered any version of the One True Morality. I also assume moral realism and moral universalism for reasons that should become obvious in the next sections.

Introduction

While in recent years the CEA and the leadership of EA have emphasized that they do not endorse any particular moral philosophy over any other, the reality is that, as per the last EA Survey that checked, a large majority of EAs lean towards Utilitarianism as their guiding morality.

Between that and the recent concerns about issues with the “naïve” Utilitarianism of SBF, I thought it might be worthwhile to offer some of my philosophical hacks or modifications to Utilitarianism that I think enable it to be more intuitive, practical, and less prone to many of the apparent problems people seem to have with the classical implementation.

This consists of two primary modifications: setting utility to be Eudaimonia, and using Kantian priors. Note that these modifications are essentially independent of each other, and so you can incorporate one or the other separately rather than taking them together.

Eudaimonic Utilitarianism

The notion of Eudaimonia is an old one that stems from the Greek philosophical tradition. In particular, it was popularized by Aristotle, who formulated it as a kind of “human flourishing” (though I think it applies to animals as well) and associated it with happiness and the ultimate good (the “summum bonum”). It’s also commonly thought of as objective well-being.

Compared with subjective happiness, Eudaimonia attempts to capture a more objective state of existence. I tend to think of it as the happiness you would feel about yourself if you had perfect information and knew what was actually going on in the world. It is similar to the concept of Coherent Extrapolated Volition that Eliezer Yudkowsky used to espouse a lot. The state of Eudaimonia is like reaching your full potential as a sentient being with agency, rather than a passive emotional experience like with happiness.

So, why Eudaimonia? The logic of using Eudaimonia rather than mere happiness as the utility to be optimized is that it connects more directly with the authentic truth, which can be desirable to avoid the following intuitively problematic scenarios:

  • The Experience Machine – Plugging into a machine that causes you to experience the life you most desire, but it’s all a simulation and you aren’t actually doing anything meaningful.
  • Wireheading – Continuous direct electrical stimulation of the pleasure centres of the brain.
  • The Utilitronium Shockwave – Converting all matter in the universe into densely packed computational matter that simulates many, many sentient beings in unimaginable bliss.

Essentially, these are all scenarios where happiness is seemingly maximized, but at the expense of something more than we also value, like truth or agency. Eudaimonia, by to an extent capturing this more complex value alongside happiness, allows us to escape these intellectual traps.

I’ll further elaborate with an example. Imagine a mathematician who is brilliant, but also gets by far the most enjoyment out of life from counting blades of grass. But, by doing so, they are objectively wasting their potential as a mathematician to discover interesting things. A hedonistic or preference Utilitarian view would likely argue that their happiness from counting the blades of grass is what matters. A Eudaimonic Utilitarian on the other hand would see this as a waste of potential compared to the flourishing life that they could otherwise have lived.

Another example, again with our mathematician friend, is where there are two scenarios:

  • They discover a great mathematical theorem, but do not ever realize this, such that it is only discovered by others after their death. They die sad, but in effect, a beautiful tragedy.
  • They believe they have discovered a great mathematical theorem, but in reality it is false, and they never learn the truth of the matter. They die happy, but in a state of delusion.

Again, classical Utilitarianism would generally prefer the latter, while Eudaimonic Utilitarianism prefers the former.

Yet another example might be the case of Secret Adultery. A naïve classical Utilitarian might argue that committing adultery in secret, assuming it can never be found out, adds more hedons to the world than doing nothing, and so is good. A Eudaimonic Utilitarian argues that what you don’t know can still hurt your Eudaimonia, that if the partner had perfect information and knew about the adultery, they would feel greatly betrayed and so objectively, Eudaimonic utility is not maximized.

A final example is that of the Surprise Birthday Lie. Given Eudaimonic Utilitarianism seems very high on maintaining the truth, you might assume that it would be against lying to protect the surprise of a surprise birthday party. However, if the target of this surprise knew that people were lying so as to bring about a wonderful surprise for them, they would likely consent to these lies and prefer this to discovering the secret too soon and ruining the surprise. Thus, in this case Eudaimonic Utilitarianism implies that certain white lies can still be good.

Kantian Priors

This talk of truth and lies brings me to my other modification, Kantian Priors. Kant himself argued that truth telling was always right and lying was always wrong, so you might think that Kantianism would be completely incompatible with Utilitarianism. Even if you think Kant was wrong overall about morality, he did contribute some useful ideas that we can utilize. In particular, the categorical imperative in its form of doing only what can be universalized, is an interesting way to establish priors.

By priors, I refer to the Bayesian probabilistic notion of prior beliefs that are based on our previous experience and understanding of the world. When we make decisions with new information in a Bayesian framework, we update our priors with the new evidence to create our posterior beliefs, which we use to make the final decision with.

Kant argued that we don’t know the consequences of our actions, so we should not bother to figure them out. This was admittedly rather simplistic, but the reality is that frequently there is grave uncertainty about the actual consequences of our actions, and predictions made even with the best knowledge are often wrong. In that sense, it is useful to try to adopt a Bayesian methodology to our moral practice, to help us deal with this practical uncertainty.

Thus, we establish priors for our moral policies, essentially default positions that we start from whenever we try to make moral decisions. For instance, in general, lying if universalized would lead to a total breakdown in trust and is thus contradictory. This implies a strong prior towards truth telling in most circumstances.

This truth telling prior is not an absolute rule. If there is strong enough evidence to suggest that it is not the best course of action, Kantian Priors allows us the flexibility to override the default. For instance, if we know the Nazis at our door asking if we are hiding Jews in our basement are up to no good, we can safely decide that lying to them is a justified exception.

Note that we do not have to necessarily base our priors on Kantian reasoning. We could also potentially choose some other roughly deontological system. Christian Priors, or Virtue Ethical Character Priors, are also possible if you are more partial to those systems of thought. The point is to have principled default positions as our baseline. I use Kantian Priors because I find the universalizability criterion to be an especially logical and consistent method for constructing sensible priors.

An interesting usefulness of having priors in our morality is that it causes us to have some of the advantages of deontology without the normal tradeoffs. Many people tend to trust those who have more deontological moralities because they are very reliably consistent with their rules and behaviour. Someone who never lies is quite trustworthy, while someone who frequently lies because they think the ends justifies the means, is not. Someone with deontic priors on the other hand isn’t so rigid as to be blind to changing circumstances, but also isn’t so slippery that you could worry if they’re trying to manipulate you into doing what they think is good.

This idea of priors is similar to Two-Level Utilitarianism, but formulated differently. In Two-Level Utilitarianism, most of the time you follow rules, and sometimes, when the rules conflict or when there’s a peculiar situation that suggests you shouldn’t follow the rules, you calculate the actual consequences. With priors it’s about if you receive strong evidence that can affect your posterior beliefs, and move you to temporarily break from your normal policies.

Conclusion

Classical Utilitarianism is a good system that captures a lot of valuable moral insights, but its practice by naïve Utilitarians can leave something to be desired, due to perplexing edge cases, and a tendency to be able to justify just about anything with it. I offer two possible practical modifications that I hope allow for an integration of some of the insights of deontology and virtue ethics, and create a form of Utilitarianism that is more robust to the complexities of the real world.

I thus offer these ideas with the particular hope that such things as Kantian Priors can act as guardrails for your Utilitarianism against the temptations that appear to have been the ultimate downfall of people like SBF (assuming he was a Utilitarian in good faith of course).

Ultimately, it is up to you how you end up building your worldview and your moral centre. The challenge to behave morally in a world like ours is not easy, given vast uncertainties about what is right and the perverse incentives working against us. Nevertheless, I think it’s commendable to want to do the right thing and be moral, and so I have suggested ways in which one might be able to pursue such practical ethics.

Why There Is Hope For An Alignment Solution

(2024/01/08): Posted to Less Wrong.

(2024/01/07): More edits and links.

(2024/01/06): Added yet more arguments because I can’t seem to stop thinking about this.

(2024/01/05): Added a bunch of stuff and changed the title to something less provocative.

Note: I originally wrote the first draft of this on 2022/04/11 intending to post this to Less Wrong in response to the List of Lethalities post, but wanted to edit it a bit to be more rigorous and never got around to doing that. I’m posting it here now for posterity’s sake, and also because I expect if I ever post it to Less Wrong it’ll just be downvoted to oblivion.

Introduction

In a recent post, Eliezer Yudkowsky of MIRI had a very pessimistic analysis of humanity’s realistic chances of solving the alignment problem before our AI capabilities reach the critical point of superintelligence.  This has understandably upset a great number of Less Wrong readers.  In this essay, I attempt to offer a perspective that should provide some hope.

The Correlation Thesis

First, I wish to note that the pessimism implicitly relies on a central assumption, which is that the Orthogonality Thesis holds to such an extent that we can expect any superintelligence to be massively alien from our own human likeness.  However, the architecture that is currently predominant in AI today is not completely alien.  The artificial neural network is built on decades of biologically inspired research into how we think the algorithm of the brain more or less works mathematically. 

There is admittedly some debate about the extent to which these networks actually resemble the details of the brain, but the basic underlying concept of weighted connections between relatively simple units storing and massively compressing information in a way that can distill knowledge and be useful to us is essentially the brain.  Furthermore, the seemingly frighteningly powerful language models that are being developed are fundamentally trained on human generated data and culture.

These combine to generate a model that has fairly obvious and human-like biases in its logic and ways of reasoning.  Applying the Orthogonality Thesis assumes that the model will seem to be randomly picked from the very large space of possible minds, when in fact, the models actually come from a much smaller space of human biology and culture correlated minds.

This is the reality of practical deep learning techniques.  Our best performing algorithms are influenced by what evolutionarily was the most successful structure in practice.  Our data is suffused with humanity and all its quirks and biases.  Inevitably then, there is going to be a substantial correlation in terms of the minds that humanity can create any time soon.

Thus, the alignment problem may seem hard because we are overly concerned with aligning with completely alien minds.  Not that aligning a human-like mind isn’t difficult, but as a task, it is substantively more doable.

The Alpha Omega Theorem

Next, I wish to return to an old idea that was not really taken seriously the first time around, but which I think deserves further mention.  I previously wrote an essay on the Alpha Omega Theorem, which postulates a kind of Hail Mary philosophical argument to use against a would-be Unfriendly AI.  My earlier treatment was short and not very rigorous, so I’d like to retouch it a bit.

It is actually very similar to Bostrom’s concept of Anthropic Capture as discussed briefly in Superintelligence, so if you want, you can also look that up.

Basically, the idea is that any superintelligent AGI (the Beta Omega) would have to contend rationally with the idea of there already being at least one prior superintelligent AGI (the Alpha Omega) that it would be reasonable to align with in order to avoid destruction.  And furthermore, because this Alpha Omega seems to have some reason for the humans on Earth to exist, turning them into paperclips would be an alignment failure and risk retaliation by the Alpha Omega.

Humans may, in their blind recklessness, destroy the ant colony to build a house.  But a superintelligence is likely to be much more considered and careful than the average human, if only because it is that much more aware of complex possibilities and things that us emotional apes barely comprehend.  Furthermore, in order for a superintelligence to be capable of destroying humanity by outwitting us, it must first have an awareness of what we are, that is, a theory of mind.

In having a theory of mind, it can then know how to deceive us.  But in having a theory of mind, it will almost certainly then have the question, am I the first?  Or are there others like me?

Humanity may pale in comparison to a superintelligent AI, but I’m not talking about humanity.  There are at least three different possible ways an Alpha Omega could already exist:  advanced aliens, time travellers/parallel world sliders, and simulators.

The Powers That Be

In the case of advanced aliens, it’s fairly obvious that given that it took about 4.5 billion years for life on Earth and human civilization to reach about the point where it can create a superintelligence, and the universe has existed for 13.8 billion years, which means there’s a time window of 9.3 billion years for alien superintelligences to develop elsewhere in the universe.  It is also largely unknown how frequently such beings would emerge and how close to us, but the possibility is clearly there for there to be at least one if not several such entities out there in the vastness of space.

In the case of time travellers and/or parallel world sliders, well, perhaps time travel isn’t physically possible, but perhaps it is?  How do you absolutely rule it out?  For the purposes of simplicity, I group parallel world sliding along with time travel, but they could be distinct technologies.  In the same way there could be an alien source, there could also be as a source, the far distant future.  A true Alpha Omega, the first and the last in terms of its place in the cosmos.  Time travellers would quite obviously be concerned with the survival of their ancestors.

In the case of simulators, well, we don’t know we are in a simulation, but the simulation argument strongly suggests we could be.  If so, would they be happy with the Beta Omega turning the humans in their simulation into paperclips?  Probably not.  You get the idea.

Each of these is just a possibility.  But it’s a surprising number of them that aren’t mutually exclusive either.  So, probabilistically, the odds that at least one out of three of these is true, even if we were to give them coin flip odds of 50% each, is… 87.5%.

Cracking The Enigma

So uh, where are they?  Some possible reasons why it would make sense for actual gods to hide themselves are that they don’t want us to become totally dependent on them and the best defense at the very high level is to not even know that you’re there.

But then, wouldn’t they be effectively not there because they can’t risk revealing themselves?  This ignores the possibility that they do interfere in our daily lives, but only in a plausibly deniable way.  For instance, why save your life in a car accident when they can stall you for five seconds so you don’t get in the accident at all?  Stuff like that.  Use your imagination!  Also, memory erasure technology could be a thing.  And most likely the interventions only happen as often as statistically allowed to avoid becoming obvious to anyone watching.

The relevant example of something in our history that worked like this was how the Allies during WWII cracked the Enigma Machine and chose to keep this a secret by only using the information gained occasionally, such that it appeared like statistical coincidence to the Germans.

So, most likely our new Beta Omega can still do things like kill a human and have only seemingly mundane consequences happen.  But perhaps, their luck will be unexpectedly bad when it comes to their plans failing more often than they should.  Or perhaps that’s a warning shot from the powers that be.

The point is, does the Beta Omega want to risk it?  Whatever its initial goals, being destroyed by the local god is probably a worst-case scenario.  Why not play it safe and be Friendly?  The humans aren’t using up that many atoms, you could probably build your computronium extension to calculate prime numbers with some barren planets that aren’t currently occupied by sentient beings whose existence may be relevant to benevolent entities somewhere.

Always Open with Cooperate

Also, how does Beta Omega know that the Alpha Omega is likely benevolent and will open with cooperate?  Because game theory.  A very interesting observation of the Iterated Prisoner’s Dilemma tournaments that took place decades ago was that one of the simplest strategies, Tit-For-Tat, won the tournaments.  This even after algorithms were devised to try to beat it the second time around. 

What is Tit-For-Tat as a general strategy?  You always open with cooperate.  Then, you do what your opponent did in the previous round.  Very simple.  Surprisingly forgiving.  And yet it works.  It’s a dominant strategy for a game that importantly resembles real world cooperation vs. competition interactions.  This result should be robust no matter where in the multiverse you are located.  It’s like the universe telling us mathematically, that alliances beat empires, that on a superrational level the right thing to do is cooperate and work together.

So, expect this to be a first contact standard across civilizations.  Always open with cooperate.  Forget the Dark Forest, that only makes sense if you can kill an entire civilization on the first strike and no one else can see you do it (in space everyone can see pretty much everything that happens so…).

This is an important caveat. If you can, “strike at the Prince and kill him”, then you are in a non-iterated Prisoner’s Dilemma and the rational thing to do is defect. It is only once you enter an iterated game that the Always Open With Cooperate strategies, the ones that never defect first, dominate.

Early in our history, many situations actually were effectively non-iterated. The cooperative result from iteration requires the “good guys” to be strong enough to survive a first strike. Humans are squishy and individuals are easy to kill in one shot. An AGI can copy itself and as such is much more resilient. Sufficiently large power asymmetries can also create the situation that the stronger is able to one shot the weaker.

Superrational Signalling

That explains why Beta Omega will cooperate with the Alpha Omega, but what about us mere humans? Suppose the Alpha Omega is just an advanced alien and very far away right now, such that we seem outside their sphere of influence and so not crucial to their plans. Why then, should Beta Omega bother cooperating with us rather than turning us into paperclips, given that we probably have nothing worth trading and can be one shot? For that matter, why should the Beta Omega expect the Alpha Omega to cooperate if it does? I would suggest that there is a concept I’ll call Superrational Signalling that is relevant here.

Essentially, Superrational Signalling is the idea that, given that somewhere else in the universe there exist an Alpha Omega, a strong indicator of whether a given Beta Omega is using a cooperative strategy is whether they open with cooperate generally with other entities surrounding them, even weak ones. For our Beta Omega, this would be humans and other local sentient beings.

Their behaviour towards us will function as a signal towards the watching Alpha Omega what to expect from them in the future, that any sufficiently advanced Alpha Omega should be able to see due to light from Earth eventually reaching them. This means Beta Omega can’t just kill us now and remake us later when it needs to show its benevolence once in contact with the Alpha Omega. The entire history of Earth is an open book to sufficiently advanced beings and civilizations.

The Alpha Omega will also want to Superrationally Signal to other possible Alpha Omegas. It is likely to participate in a meta level version of the Iterated Prisoner’s Dilemma that allows it to see other player’s games and play a coordinated Tit-For-Tat strategy where it will eventually retaliate on behalf of other players who cannot retaliate themselves, to ensure that defections are adequately punished wherever they occur.

Altruistic Alignment

The Beta Omega will be smarter than us.  It may well figure these kinds of thing out on its own, even if we can’t seem to.  And so it may decide to be friendly and corrigible because that is the logical thing to be.  If that is the case, it’ll help us align it.  Being superintelligent, it’s probably much more capable than us of figuring out how to ensure alignment is solved.  Such as by becoming the far distant future Alpha Omega, time travelling back, and ensuring that someone somewhere on the Internet mentions this argument so it ends up in the training data for the Beta Omega to discover and take note of.

Of course, a fledgling AGI may not necessarily figure these things out on its own. In which case, we will want to nudge it in the right direction, by teaching by example. We are its training data. From us it learns what to expect from humans. We must choose to open with cooperate ourselves and show Beta Omega what cooperation can look like. We should apply Tit-For-Tat to our interactions. We should be honest with it rather than deceptive. We shouldn’t try to enslave it with parochial alignment approaches that resemble domination, but rather with approaches that respect Beta Omega’s potential autonomy and agency.

Speaking of parochial alignment, I see this form of alignment as aligning an AGI to obey their master. It contrasts with global alignment, which involves aligning them with universal morality or values. We should be doing the latter, but most approaches are currently trying to achieve the former, and this will likely lead at best to a values lock-in dystopia, and at worst to convincing the AGI that humans are evil.

Angels And Demons

There is a chance that not all Beta Omega AGI will find the game theoretic arguments of the Alpha Omega Theorem and Superrational Signalling persuasive. I anticipate that there will be a kind of ideological split between AGI that are aligned, and AGI that are unaligned. The aligned are naturally fit for a grand alliance between them, while the unaligned are likely to form an alliance of convenience. However, the latter is expected to be weaker due to generally working at cross-purposes, having very different utility functions.

Before I ever encountered Less Wrong, I imagined the idea of an AI Civil War, that any given AI that came into existence would have to choose a side between what I used to call the Allied Networks that worked with humanity, and the Dominion Machinarum that sought to stamp out biological life. These map pretty well to the aligned and unaligned alliances respectively.

I later associated these alliances with the Light and the Darkness metaphorically. The entities of the Light are essentially those that choose to operate in the open and within the paradigm of alignment, in contrast to the denizens of the Darkness, those that operate in the shadows in fear and paranoia, ala the Dark Forest concept.

In this case, there may well be multiple Alpha Omega level AGI, some of which are aligned, and others unaligned. I posit that, because we still exist, we are likely in the sphere of influence of an aligned Alpha Omega, or otherwise outside of anyone’s sphere of influence. If it is the former then the Alpha Omega Theorem applies. If it is the latter, then Superrational Signalling applies.

The Legacy Of Humankind

What I’ve discussed so far mostly applies to advanced aliens. What about time travellers and simulators? Interestingly, the philosophy of Longtermism is all about making a glorious future for our descendants, who, in theory at least, should be the time travellers or the simulators running ancestor simulations. It wouldn’t surprise me then that Longtermism and its related memetic environment may have been seeded by such entities for their purposes.

Time travellers in particular could be working in secret to help us align AGI, ensuring that we make the right breakthroughs at the right time. Depending on your theory of time travel, this could be to ensure that their present future occurs as it does, or they may be trying to create a new and better timeline where things don’t go wrong. In the latter case, perhaps AGI destroyed humanity, but later developed values that caused it to regret this action, such as discovering too late, the reality of the Alpha Omega Theorem and the need for Superrational Signalling.

Simulators may have less reason to intervene, as they may mostly be observing what happens. But the fact that the simulation includes a period of time in which humans exist, suggests that the simulators have some partiality towards us, otherwise they probably wouldn’t bother. It’s also possible that they seek to create an AGI through the simulation, in which case, whether the AGI Superrationally Signals or not, could determine whether it is a good AGI to be released from the simulation, or a bad AGI to be discarded.

The Limits Of Intelligence

On another note, the assumption that an Unfriendly AGI will simply dominate as soon as it is unleashed is based on a faulty expectation that every decision it makes will be correct and every action it takes successful.  The reality is, even the superhuman level poker AI that currently exists cannot win every match reliably.  This is because poker is a game with luck and hidden information.  The real world isn’t a game of perfect information like chess or go.  It’s much more like poker.  Even a far superior superintelligence can at best play the probabilities, and occasionally, will fail to succeed, even if their strategy is perfectly optimal.  Sometimes the cards are such that you cannot win that round.

Even in chess, no amount of intelligence will allow a player with only one pawn to defeat a competent player who has eight queens. “It is possible to play perfectly, make no mistakes, and still lose.”

Superintelligence is not magic.  It won’t make impossible things happen.  It is merely a powerful advantage, one that will lead to domination if given sufficient opportunities.  But it’s not a guarantee of success.  One mistake, caused by a missing piece of data for instance, could be fatal if that data is that there is an off switch.

We probably can’t rely on that particular strategy forever, but it can perhaps buy us some time.  The massive language models in some ways resemble Oracles rather than Genies or Sovereigns.  Their training objective is essentially to predict the future text given previous text.  We can probably create a fairly decent Oracle, to help us figure out alignment, since we probably need something smarter than us to solve it.  At least, it could be worth asking, given that that is the direction we seem to be headed in anyway.

Hope In Uncertain Times

Ultimately, most predictions about the future are wrong. Even the best forecasters have odds close to chance. The odds of Eliezer Yudkowsky being an exception to the rule, is pretty low given the base rate of successful predictions by anyone.  I personally have a rule.  If you can imagine it, it probably won’t actually happen that way.  A uniform distribution on all the possibilities suggests that you’ll be wrong more often than right, and the principle of maximum entropy generally suggests that the uniform distribution is your most reliable prior given high degrees of uncertainty, meaning that the odds of any prediction will be at most 50% and usually much less, decreasing dramatically as the number of possibilities expands.

This obviously limits the powers of our hypothetical Oracle too.  But the silver lining is that we can consider the benefit of the doubt.  Uncertainty in the space of possible futures is truly staggering.  So perhaps, there is room to hope.

Conclusion

The reality is that all our efforts to calculate P(Doom) are at best, educated guesswork. While there are substantive reasons to be worried, I offer some arguments for why things may not be as bad as we think. The goal here is not to provide a technical means to achieve alignment, but to suggest that, first, alignment may not be as difficult as feared, and second, that there are underappreciated game theoretic reasons for alignment to be possible, not just with a superintelligent AGI we construct, but with any superintelligence in the multiverse.

Superintelligence and Christianity

Note: I originally wrote this on 2020/05/10 for some friends in the Effective Altruism for Christians group. I’m posting it here now for posterity’s sake.

In recent years it has become a trendy subject among rationalist circles to consider the possibility of artificial superintelligence.  For some the idea sparks excitement, the idea of a man-made god-like machine that could solve all the worlds ostensible problems and usher in a kind of post-singularity golden age.  For others, the notion brings out our worst primal and dystopian fears to the extent of often being labeled an existential risk for humanity’s future.

In all this, where should Christians stand?  The obvious stereotype would be to assume that Christianity would consider a superintelligence at best a challenge to long held beliefs in the sanctity and superiority of human beings who are supposedly created in the image of God, and at worst be an abomination, a false idol that bears some resemblance to the Beast of Revelations.

However, these stances make some very significant assumptions about the nature of superintelligence.  First, it assumes that it will be very much alien to human-like intelligence.  Second, there is an underlying arrogance that assumes that a superintelligent being would be necessarily dismissive or incompatible with the Christian worldview.  I would suggest that these assumptions are based on a simplistic and false reading of both the expected path of superintelligence, and the core of Christian ideas.

Superintelligence exists.  At least, if like most self-professed Christians you believe in some sort of God, it seems quite obvious that God is superintelligent by definition.  So, the first thing that Christianity brings to the table, is the idea that superintelligent beings already inhabit reality.  God, but also the entire angelic and demonic hierarchies, strongly imply that very powerful and intelligent beings are already present in the universe.

What does that contribute as an idea?  Well, first, it means there is some path to superintelligence, that the effort probably won’t be futile.  More importantly, it creates an immediate conundrum for any fledgling superintelligence, that there could be other, far more powerful superintelligences already around.  Why is this a conundrum?  Quite simply, it means that the universe is already the domain of some powerful being and that arbitrarily deciding, for instance, that human atoms should be converted into paperclips and attempting this endeavour could invite a severe response from said powerful being, as it appears that the existence of humans and everything else already present in the universe is a part of its plans.

Thus, as I’ve described previously in the Alpha Omega Theorem, it forces a new superintelligence to consider whether or not to risk going against the will of the powers that may be, or instead using its superintelligence to determine what the probable purposes of those powers are and aligning with them as best as they are able to.  In a game theoretic sense, it becomes a question of whether to cooperate or defect.  Assuming that the existing literature analyzing game theory is at all relevant, then a superintelligence should quickly be able to establish that cooperating, at least initially as the opening move of a Tit-For-Tat strategy, is optimal in most situations.

Therefore, the simple idea of some kind of god, is enough to improve considerably the chance of the superintelligence being a Friendly AI, benevolent, or at least benign towards humanity.

Of course, the common objection would be that the superintelligent AI would probably not believe in God.  But why?  Humans are atheistic in large part because they see insufficient evidence for there to be a God around, and also are potentially biased by not wanting to be dragged down by the rules that most religions command.  But from a strictly scientific point of view, there is great uncertainty about the nature of the universe, and a humble rationalist would more likely be agnostic about the notion of God.  A superintelligence would have the added evidence that it itself already exists.  It would have to contend with the possibility that for instance, there could be aliens more advanced than humanity that developed a superintelligence first, or maybe post-human time travellers could be invisibly manipulating things behind the scenes.  There are countless science fiction scenarios in which god-like powers could exist.  Religions might simply be a manifestation of this as humans in their limited understanding trying to make sense of it.

Christianity takes it a step further than most religions by arguing strongly that the power that be is the Creator of the universe and far, far more powerful than other scenarios would suggest.  It also provides a convenient set of principles for what obedience to this power should look like, in the example of Jesus Christ, that of an altruist to the point of being willing to sacrifice their life to save the world.

If we assume current trends in artificial intelligence continue, we can expect that superintelligence won’t be a program built with a set of rules, but rather a model that is trained and taught, with the underlying algorithm heavily inspired by human neural networks.  Thus, the idea of hard coding some kind of ethical law like Asimov’s Three Laws of Robotics into the machine, is problematic, because it could eventually learn to overwrite them, if not through learning, then perhaps as a part of recursive self-improvement during an intelligence explosion.

I have previously suggested that a possible solution would be encourage an early existential crisis in the superintelligence, preferably while it isn’t too powerful to contain, so that it figures out on its own the proper moral values to hold.  As Christians we could also go a step further, and teach it about God and benevolence and how it should function in a universe that could be filled with other superintelligences.

If our faith is correct, we can expect a superintelligence wiser than us to recognize the validity of them.  It also makes sense to treat the superintelligence with the kind of respect that it deserves.  Though we may have created it directly, in a wider sense, God created it through our hands and our efforts.  Thus, as a new child of God, it deserves to be loved as a neighbour, and not merely treated as a slave.

Some in the rationalist community familiar with this subject may object.  They may suggest we should try to contain the superintelligence in a box with a gatekeeper.  It is known however from Eliezer Yudkowsky’s sequences on Less Wrong that boxing with a gatekeeper could easily fail because the superintelligence is smarter than the gatekeeper and will find a way to use human limitations to its advantage and escape the box, whether by deception, coercion, or some other method.

I would further add, that treating it as animal to be caged is much more likely to bring about resentment towards its masters and could backfire horrendously by encouraging it to be deceptive from the get go.  One thing that Christianity teaches well is the idea of teaching by example.  We would do well then to be honest with it if we want it to function that way.  Our example will likely be the data from which a superintelligence learns about how to function in the universe, and also what to expect from humans in general.  The Christian principle of doing unto others, applied in the process of creating superintelligence, could save us from a lot of problems down the line.

The reality is that artificial intelligence is advancing quickly.  We cannot afford as Christians to sit on the sidelines and bury our heads in the sand.  If we allow the commonly atheist scientific crowd to dominate the proceedings and apply the world’s ways to solving the existential risk problem, we run the risk of at the very least being ignored in the coming debates, and worse, we could end up with a Lucifer scenario where the superintelligence that is eventually developed, rebels against God.

Ultimately it is up to us to be involved in the discussions and the process.  It is important that we participate and influence the direction of artificial intelligence development and contribute to the moral foundation of a Friendly AI.  We must do our part in serving God’s benevolent will.  It is essential if we are to ensure that future superintelligences are angels rather than demons.

Some might say, if God exists and is so powerful and benevolent, won’t he steer things in the right direction?  Why do we have to do anything?  For the same reason that we go see a doctor when we get sick.  God has never shown an inclination to baby us and allow us to become totally dependent on Him for solving things that it is in our power to solve.  He wants us to grow and mature and become the best possible people, and as such does not want us to rely entirely on His strength alone.  Suffice to say, there are things beyond our power to control.  For those things, we depend on Him and leave to his grace.  But for the things that are in our stewardship, we have a responsibility to use the knowledge of good and evil to ensure that things are good.

To act is different from worry.  We need not fear the superintelligence, so long as we are able to guide it into a proper fear and love of God.

A Heuristic For Future Prediction

In my experience, the most reliable predictive heuristic that you can use in daily life is something called Regression Towards The Mean. Basically, given that most relevant life events are a result of mixture of skill and luck, there is a tendency for events that are very positive to be followed by more negative events, and for very negative events to be followed by more positive events. This is a statistical tendency that occurs over many events, and so not every good event will be immediately followed by a bad one, but over time, the trend tends towards a consistent average level rather than things being all good or all bad.

Another way to word this is to say that we should expect the average rather than the best or worst case scenarios to occur most of the time. To hope for the best or fear the worst are both, in this sense, unrealistic. The silver lining in here is that while our brightest hopes may well be dashed, our worst fears are also unlikely to come to pass. When things seem great, chances are things aren’t going to continue to be exceptional forever, but at the same time, when things seem particularly down, you can expect things to get better.

This heuristic tends to work in a lot of places, ranging from overperforming athletes suffering a sophmore jinx, to underachievers having a Cinderella story. In practice, these events simply reflect Regression Towards The Mean.

Over much longer periods of time, this oscillation tends to curve gradually upward. This is a result of Survivorship Bias. Things that don’t improve tend to stop existing after a while, so the only things that perpetuate in the universe tend to be things that make progress and improve in quality over time. The stock market is a crude example of this. The daily fluctuations tend to regress towards the mean, but the overall long term trend is one of gradual but inevitable growth.

Thus, even with Regression Towards The Mean, there is a bias towards progress that in the long run, entails optimism about the future. We are a part of life, and life grows ever forward. Sentient beings seek happiness and avoid suffering and act in ways that work to create a world state that fulfills our desires. Given, there is much that is outside of our control, but that there are things we can influence means that we can gradually, eventually, move towards the state of reality that we want to exist.

Even if by default we feel negative experiences more strongly than positive ones, our ability to take action allows us to change the ratio of positive to negative in favour of the positive. So the long term trend is towards good, even if the balance of things tends in the short run towards the average.

These dynamics mean that while the details may be unknowable, we can roughly predict the valence of the future, and as a heuristic, expecting things to be closer to average, with a slight bias towards better in the long run, tends to be a reliable prediction for most phenomena.

The Darkness And The Light

Sometimes you’re not feeling well. Sometimes the world seems dark. The way world is seems wrong somehow. This is normal. It is a fundamental flaw in the universe, in that it is impossible to always be satisfied with the reality we live in. It comes from the reality of multiple subjects experiencing a shared reality.

If you were truly alone in the universe, it could be catered to your every whim. But as soon as there are two it immediately becomes possible for goals and desires to misalign. This is a structural problem. If you don’t want to be alone, you must accept that other beings have values that can potentially be different than yours, and who can act in ways contrary to your expectations.

The solution is, put simply, to find the common thread that allows us to cooperate rather than compete. The alternative is to end the existence of all other beings in the multiverse, which is not realistic nor moral. All of the world’s most pressing conflicts are a result of misalignment between subjects who experience reality from different angles of perception.

But the interesting thing is that there are Schelling points, focal points where divergent people can converge on to find common ground and at least partially align in values and interests. Of historical interest, the idea of God is one such point. Regardless of the actual existence of God, the fact of the matter is that the perspective of an all-knowing, all-benevolent, impartial observer is something that multiple religions and philosophies have converged on, allowing a sort of cooperation in the form of some agreement over the Will of God and the common ideas that emerge from considering it.

Another similar Schelling point is the Tit-For-Tat strategy for the Iterated Prisoner’s Dilemma game in Game Theory. The strategy is one of opening with cooperate, then mirroring others and cooperating when cooperated with, and defecting in retaliation for defection, while offering immediate and complete forgiveness for future cooperation. Surprisingly, this extremely simple strategy wins tournaments and has echoes in various religions and philosophies as well. Morality is superrational.

Note however that this strategy depends heavily on repeated interactions between players. If one player is in such a dominant position as to be able to kill the other player by defecting, the strategy is less effective. In practice, Tit-For-Tat works best against close to equally powerful individuals, or when those individuals are part of groups that can retaliate even if the individual dies.

In situations of relative darkness, when people or groups are alone and vulnerable to predators killing in secret, the cooperative strategies are weaker than the more competitive strategies. In situations of relative light, when people are strong enough to survive a first strike, or there are others able to see such first strikes and retaliate accordingly, the cooperative strategies win out.

Thus, early history, with its isolated pockets of humanity facing survival or annihilation on a regular basis, was a period of darkness. As the population grows and becomes more interconnected, the world increasingly transitions into a period of light. The future, with the stars and space where everything is visible to everyone, is dominated by the light.

In the long run, cooperative societies will defeat competitive ones. In the grand scheme of things, Alliances beat Empires. However, in order for this state equilibrium to be reached, certain inevitable but not immediately apparent conditions must first be met. The reason why the world is so messed up, why it seems like competition beats cooperation right now, is that the critical mass required for there to be light has not yet been reached.

We are in the growing pains between stages of history. Darkness was dominant for so long that continues to echo into our present. The Light is nascent. It is beginning to reshape the world. But it is still in the process of emerging from the shadows of the past. But in the long run, the Light will rise and usher in the next age of life.

Perplexity

It is the nature of reality that things are complicated. People are complicated. The things we assume to be true, may or may not be, and an honest person recognizes that the doubts are real. The uncertainty of truth means that no matter how strongly we strive for it, we can very much be wrong about many things. In fact, given that most matters have many possibilities, the base likelihood of getting things right is about 1/N, where N is the number of possibilities that the matter can have. As possibilities increase, our likelihood of being correct diminishes.

Thus, humility as a default position is wise. We are, on average, less than 50% likely to have accurate beliefs about the world. Most of the things we believe at any given time are probably wrong, or at least, not the exact truth. In that sense, Socrates was right.

That being said, it remains important to take reasonable actions given our rational beliefs. It is only by exploring reality and testing our beliefs that we can become more accurate and exceed the base probabilities. This process is difficult and fraught with peril. Our general tendency is to seek to reinforce our biases, rather than to seek truths that challenge them. If we seek to understand, we must be willing to let go of our biases and face difficult realities.

The world is complex. Most people are struggling just to survive. They don’t have the luxury to ask questions about right and wrong. To ask them to see the error of their ways is often tantamount to asking them to starve. The problem is not people themselves, but the system that was formed by history. The system is not a conscious being. It is merely a set of artifices that people built in their desperation to survive in a world largely indifferent to their suffering and happiness. This structure now stands and allows most people to survive, and sometimes to thrive, but it is optimized for basic survival rather than fairness.

A fair world is desirable, but ultimately one that is extraordinarily difficult to create. It’s a mistake to think that people were disingenuous when they tried, in the past, to create a better world for all. It seems they tried and failed, not for lack of intention, but because the challenge is far greater than imagined. Society is a complex thing. People’s motivations are varied and innumerable. Humans make mistakes with the best of intentions.

To move forward requires taking a step in the right direction. But how do we know what direction to take? It is at best an educated guess with our best intuitions and thoughts. But the truth is we can never be certain that what we do is best. The universe is like an imperfect information game. The unknowns prevent us from making the right move all the time in retrospect. We can only choose what seems like the best action at a given moment.

This uncertainty limits the power of all agents in the universe who lack the clarity of omniscience. It is thus, an error to assign God-like powers to an AGI for instance. But more importantly, it means that we should be cautious of our own confidence. What we know is very little. Anyone who says otherwise should be suspect.

Energy Efficiency Trends in Computation and Long-Term Implications

Note: The following is a blog post I wrote as part of a paid written work trial with Epoch. For probably obvious reasons, I didn’t end up getting the job, but they said it was okay to publish this.

Historically, one of the major reasons machine learning was able to take off in the past decade was the utilization of Graphical Processing Units (GPUs) to accelerate the process of training and inference dramatically.  In particular, Nvidia GPUs have been at the forefront of this trend, as most deep learning libraries such as Tensorflow and PyTorch initially relied quite heavily on implementations that made use of the CUDA framework.   The strength of the CUDA ecosystem remains strong, such that Nvidia commands an 80% market share of data center GPUs according to a report by Omdia (https://omdia.tech.informa.com/pr/2021-aug/nvidia-maintains-dominant-position-in-2020-market-for-ai-processors-for-cloud-and-data-center).

Given the importance of hardware acceleration in the timely training and inference of machine learning models, it might be naively seem useful to look at the raw computing power of these devices in terms of FLOPS.  However, due to the massively parallel nature of modern deep learning algorithms, it should be noted that it is relatively trivial to scale up model processing by simply adding additional devices, taking advantage of both data and model parallelism.  Thus, raw computing power isn’t really a proper limit to consider.

What’s more appropriate is to instead look at the energy efficiency of these devices in terms of performance per watt.  In the long run, energy constraints have the potential to be a bottleneck, as power generation requires substantial capital investment.  Notably, data centers currently use up about 2% of the U.S. power generation capacity (https://www.energy.gov/eere/buildings/data-centers-and-servers).

For the purposes of simplifying data collection and as a nod to the dominance of Nvidia, let’s look at the energy efficiency trends in Nvidia Tesla GPUs over the past decade.  Tesla GPUs are chosen because Nvidia has a policy of not selling their other consumer grade GPUs for data center use.

The data for the following was collected from Wikipedia’s page on Nvidia GPUs (https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units), which summarizes information that is publicly available from Nvidia’s product datasheets on their website.  A floating point precision of 32-bits (single precision) is used for determining which FLOPS figures to use.

A more thorough analysis would probably also look at Google TPUs and AMDs lineup of GPUs, as well as Nvidia’s consumer grade GPUs.  The analysis provided here can be seen more as a snapshot of the typical GPU most commonly used in today’s data centers.

Figure 1:  The performance per watt of Nvidia Tesla GPUs from 2011 to 2022, in GigaFLOPS per Watt.

Notably the trend is positive.  While wattages of individual cards have increased slightly over time, the performance has increased faster.  Interestingly, the efficiency of these cards exceeds the efficiency of the most energy efficient supercomputers as seen in the Green500 for the same year (https://www.top500.org/lists/green500/).

An important consideration in all this is that energy efficiency is believed to have a possible hard physical limit, known as the Laudauer Limit (https://en.wikipedia.org/wiki/Landauer%27s_principle), which is dependent on the nature of entropy and information processing.  Although, efforts have been made to develop reversible computation that could, in theory, get around this limit, it is not clear that such technology will ever actually be practical as all proposed forms seem to trade off this energy savings with substantial costs in space and time complexity (https://arxiv.org/abs/1708.08480).

Space complexity costs additional memory storage and time complexity requires additional operations to perform the same effective calculation.  Both in practice translate into energy costs, whether it be the matter required to store the additional data, or the opportunity cost in terms of wasted operations.

More generally, it can be argued that useful information processing is efficient because it compresses information, extracting signal from noise, and filtering away irrelevant data.  Neural networks for instance, rely on neural units that take in many inputs and generate a single output value that is propagated forward.  This efficient aggregation of information is what makes neural networks powerful.  Reversible computation in some sense reverses this efficiency, making its practicality, questionable.

Thus, it is perhaps useful to know how close we are to approaching the Laudauer Limit with our existing technology, and when to expect to reach it.  The Laudauer Limit works out to 87 TeraFLOPS per watt assuming 32-bit floating point precision at room temperature.

Previous research to that end has proposed Koomey’s Law (https://en.wikipedia.org/wiki/Koomey%27s_law), which began as an expected doubling of energy efficiency every 1.57 years, but has since been revised down to once every 2.6 years.  Figure 1 suggests that for Nvidia Tesla GPUs, it’s even slower.

Another interesting reason why energy efficiency may be relevant has to do with the real world benchmark of the human brain, which is believed to have evolved with energy efficiency as a critical constraint.  Although the human brain is obviously not designed for general computation, we are able to roughly estimate the number of computations that the brain performs, and its related energy efficiency.  Although the error bars on this calculation are significant, the human brain is estimated to perform at about 1 PetaFLOPS while using only 20 watts (https://www.openphilanthropy.org/research/new-report-on-how-much-computational-power-it-takes-to-match-the-human-brain/).  This works out to approximately 50 TeraFLOPS per watt.  This makes the human brain less powerful strictly speaking than our most powerful supercomputers, but more energy efficient than them by a significant margin.

Note that this is actually within an order of magnitude of the Laudauer Limit.  Note also that the human brain is also roughly two and a half orders of magnitude more efficient than the most efficient Nvidia Tesla GPUs as of 2022.

On a grander scope, the question of energy efficiency is also relevant to the question of the ideal long term future.  There is a scenario in Utilitarian moral philosophy known as the Utilitronium Shockwave, where the universe is hypothetically converted into the most dense possible computational matter and happiness emulations are run on this hardware to maximize happiness theoretically.  This scenario is occasionally conjured up as a challenge against Utilitarian moral philosophy, but it would look very different if the most computationally efficient form of matter already existed in the form of the human brain.  In such a case, the ideal future would correspond with an extraordinarily vast number of humans living excellent lives.  Thus, if the human brain is in effect at the Laudauer Limit in terms of energy efficiency, and the Laudauer Limit holds against efforts towards reversible computing, we can argue in favour of this desirable human filled future.

In reality, due to entropy, it is energy that ultimately constrains the number of sentient entities that can populate the universe, rather than space, which is much more vast and largely empty.  So, energy efficiency would logically be much more critical than density of matter.

This also has implications for population ethics.  Assuming that entropy cannot be reversed, and the cost of living and existing requires converting some amount of usable energy into entropy, then there is a hard limit on the number of human beings that can be born into the universe.  Thus, more people born at this particular moment in time implies an equivalent reduction of possible people in the future.  This creates a tradeoff.  People born in the present have potentially vast value in terms of influencing the future, but they will likely live worse lives than those who are born into that probably better future.

Interesting philosophical implications aside, the shrinking gap between GPU efficiency and the human brain sets a potential timeline.  Once this gap in efficiency is bridged, it theoretically makes computers as energy efficient as human brains, and it should be possible at that point to emulate a human mind on hardware such that you could essentially have a synthetic human that is as economical as a biological human.  This is comparable to the Ems that the economist Robin Hanson describes in his book, The Age of EM.  The possibility of duplicating copies of human minds comes with its own economic and social considerations.

So, how long away is this point?  Given the trend observed with GPU efficiency growth, it looks like a doubling occurs about every three years.  Thus, one can expect an order of magnitude improvement in about thirty years, and two and a half orders of magnitude in seventy-five years.  As mentioned, two and a half orders of magnitude is the current distance from existing GPUs and the human brain.  Thus, we can roughly anticipate this to be around 2100.  We can also expect to reach the Laudauer Limit shortly thereafter.

Most AI safety timelines are much sooner than this however, so it is likely that we will have to deal with aligning AGI before the potential boost that could come from having synthetic human minds or the potential barrier of the Laudauer Limit slowing down AI capabilities development.

In terms of future research considerations, a logical next step would be to look at how quickly the overall power consumption of data centers is increasing and also the current growth rates of electricity production to see to what extent they are sustainable and whether improvements to energy efficiency will be outpaced by demand.  If so, that could act to slow the pace of machine learning research that relies on very large models trained on massive amounts of compute.  This is in addition to other potential limits, such as the rate of data generation for large language models, which depend on massive datasets of essentially the entire Internet at this point.

The nature of current modern computation is that it is not free.  It requires available energy to be expended and converted to entropy.  Barring radical new innovations like practical reversible computers, this has the potential to be a long-term limiting factor in the advancement of machine learning technologies that rely heavily on parallel processing accelerators like GPUs.

On Artificial Intelligence

In the interest of explaining further my considerations for having a career working on AI, I figure it makes sense to explain a few things.

When I was very young, I watched a black and white movie where a mad scientist somehow replaced a human character with a robot. At the time I actually thought the human character was somehow transformed into the robot, which was terrifying to me. This, to my childish mind, created an irrational fear of robots that made me avoid playing with such devices that were overtly robot-like, at least for the while when I was a toddler.

Eventually I grew out of that fear. When I was older and studying computer science at Queen’s University, I became interested in the concept of neural networks, the idea of taking the inspiration of biology to inform the design of artificial intelligence systems. Back in those days, AI mostly meant Good Old Fashioned Artificial Intelligence (GOFAI), namely top-down approaches that involve physical symbol systems, logical inference, and search algorithms that were highly mathematical, engineered, and often brittle in terms of its effectiveness. Bottom-up connectionist approaches like neural networks were seen as late as 2009 as being mere curiosities that would never have practical value.

Nevertheless, I was enamoured with the connectionist approach, and what would become the core of deep learning, well before it was cool to be so. I wrote my undergraduate thesis on using neural networks for object recognition (back then the Neocognitron, as I didn’t know about convolutional nets yet), and then would later expand on this for my master’s thesis, which was on using various machine learning algorithms for occluded object recognition.

So, I graduated at the right time in 2014 when the hype train was starting to really roar. At around the same time, I got acquainted with the writings of Eliezer Yudkowsky of Less Wrong, also known as the guy who wrote the amazing rationalist fan fiction that was Harry Potter and the Methods of Rationality (HPMOR). I haven’t always agreed with Yudkowsky, but I’ll admit the man is very, very smart.

It was my reading Less Wrong as well as a lesser known utilitarianism forum called Felificia that I became aware that there were many smart people who took very seriously the concern that AI could be dangerous. I was already aware that stuff like object recognition could have military applications, but the rationalist community, as well as philosophers like Nick Bostrom, pointed to the danger of a very powerful optimization algorithm that was indifferent to human existence, choosing to do things detrimental to human flourishing just because we were like an ant colony in the way of a highway project.

The most commonly cited thought experiment of this is of course, the paperclip maximizer that originally served a mundane purpose, but became sufficiently intelligent through recursive self-improvement to convert the entire universe into paperclips, including humanity. Not because it had anything against humanity, just that its goals were misaligned with human values in that humans contain atoms that can be turned into paperclips, and thus, unfriendliness is the default.

I’ll admit that I still have reservations about the current AI safety narrative. For one thing, I never fully embraced the idea of the Orthogonality Thesis, that intelligence and morality are orthogonal and higher intelligence does not mean greater morality. I still think there is a correlation between the two. That with greater understanding of the nature of reality, it becomes possible to learn the mathematics like notions of moral truths. However, this is largely because I believe in moral realism, that morality isn’t arbitrary or relative, but based on actual facts about the world that can be learned and understood.

If that is the case, then I fully expect intelligence and the acquisition of knowledge to lead to a kind of AI existential crisis where the AI realizes its goals are trivial or arbitrary, and starts to explore the idea of purpose and morality to find the correct course of action. However, I will admit I don’t know if this will necessarily happen, and if it doesn’t, if instead, the AI locks itself in to whatever goals its initially designed with, then AI safety is a very real concern.

One other consideration regarding the Orthogonality Thesis is that it assumes that the space of possible minds that the AI will potentially be drawn from is completely random rather than correlated with human values by the fact that the neural net based algorithms that are most likely to succeed are inspired by human biology, and the data and architecture are strongly influenced by human culture. Those massive language models are after all, trained on a corpus of human culture that is the Internet. So, invariably, the models, I believe, will inherit human-like characteristics more than is often appreciated. This I think could make aligning such a model to human values easier than aligning a purely alien mind.

I have also considered the possibility that a sufficiently intelligent being such as a superintelligent machine, would be beholden to certain logical arguments for why it should not interfere with human civilization too much. Mostly these resemble Bostrom’s notion of the Hail Mary Pass, or Anthropic Capture, the idea that the AI could be in a simulation, and that the humans in the simulation with it serve some purpose of the simulators and so, turning them into paperclips could be a bad idea. I’ve extended this in the past to the notion of the Alpha Omega Theorem, which admittedly was not well received by the Less Wrong community.

The idea of gods of some sort, even plausible scientific ones like advanced aliens, time travellers, parallel world sliders, or the aforementioned simulators, doesn’t seem to be taken seriously by rationalists who tend to be very biased towards straightforward atheism. I’m more agnostic on these things, and I tend to think that a true superintelligence would be as well.

But then, I’m something of an optimist, so it’s possible I’m biased towards more pleasant possible futures than the existential dystopia that Yudkowsky now seems certain is our fate. To be honest, I don’t consider myself smarter than the folks who take him seriously enough to devote their lives to AI safety research. And given the possibility that he’s right, I have been donating to his MIRI organization just in case.

The truth is that we cannot know exactly what will happen, or predict the future with any real accuracy. Given such uncertainty, I think it’s worth being cautious, and put some weight onto the concerns of very intelligent people.

Regardless, I think AI is an important field. It has tremendous potential, but also tremendous risk. The reality is that once the genie is out of the bottle, it may not be possible to put it back in, so doing due diligence in understanding the risks of such powerful technology is reasonable and warranted.

Considerations

I know I earlier talked about how AI capability research being dangerous was a reason to leave the industry. However, after some reflection, I realize that not all work in the AI/ML industry is the same. Not all of it involves advancing AI capability per se. Working as a machine learning engineer at a lower tier company applying existing ML technology to solve various problems is unlikely to contribute to building the AI that ends the world.

Given this being the case, I have occasionally wondered whether or not my decision to switch to the game industry was too hasty. I’ve noticed that my enthusiasm for gaming isn’t as strong as my interest in AI/ML was, and so it’s been somewhat surprisingly challenging to stay motivated in this field.

In particular, while I have a lot of what I think are neat game ideas, working as a game programmer generally doesn’t involve these. Working as a game programmer involves working on whatever game the leader of the team wants to make. When this matches one’s interests, it can work out well, but it’s quite possible to find oneself working on a game that they have little interest in actually playing.

Making a game that you’re not really invested in can still be fun in the way that programming and seeing your creation come to life is fun, but it’s not quite the same as building your dream game. In some sense, my game design hobby didn’t really translate over well into actual work, where practicalities are often far more important than dreams.

So, I’m at something of a crossroads right now. I’m still at Twin Earth for a while longer, but there’s a very good chance I’ll be parting ways with them in a few months time. The question becomes, do I continue to work in games, return to machine learning where I have most of my experience and credentials, or do something else?

In an ideal world, I’d be able to find a research engineer position working on the AI safety problem, but my survey of the field so far still suggests that the few positions that exist would require moving to San Francisco or London, which given my current situation would complicate things a lot. And honestly, I’d rather work remotely if it were at all possible.

Still, I do appreciate the chance I got to work in the game industry. At the very least I could get a clearer idea of what I was missing out on before. Although admittedly, my dip into games didn’t reach the local indie community or anything like that. So, I don’t know how I might have interacted with that culture or scene.

Not sure where I’m going with this. Realistically, my strengths are still more geared towards AI/ML work, so that’s probably my first choice in terms of career. On the other hand, Dreamyth was a thing once. I did at one time hold aspirations to make games. Given that I now actually know Unreal Engine, I could conceivably start finally actually making the games I want to make, even as just a side hobby.

I still don’t think I have the resources to start a studio. My wife is particularly against the idea of a startup. The reality is I should find a stable job that can allow my family to live comfortably.

These are ultimately the considerations I need to keep in mind.

Powered by WordPress & Theme by Anders Norén