In the interest of explaining further my considerations for having a career working on AI, I figure it makes sense to explain a few things.

When I was very young, I watched a black and white movie where a mad scientist somehow replaced a human character with a robot. At the time I actually thought the human character was somehow transformed into the robot, which was terrifying to me. This, to my childish mind, created an irrational fear of robots that made me avoid playing with such devices that were overtly robot-like, at least for the while when I was a toddler.

Eventually I grew out of that fear. When I was older and studying computer science at Queen’s University, I became interested in the concept of neural networks, the idea of taking the inspiration of biology to inform the design of artificial intelligence systems. Back in those days, AI mostly meant Good Old Fashioned Artificial Intelligence (GOFAI), namely top-down approaches that involve physical symbol systems, logical inference, and search algorithms that were highly mathematical, engineered, and often brittle in terms of its effectiveness. Bottom-up connectionist approaches like neural networks were seen as late as 2009 as being mere curiosities that would never have practical value.

Nevertheless, I was enamoured with the connectionist approach, and what would become the core of deep learning, well before it was cool to be so. I wrote my undergraduate thesis on using neural networks for object recognition (back then the Neocognitron, as I didn’t know about convolutional nets yet), and then would later expand on this for my master’s thesis, which was on using various machine learning algorithms for occluded object recognition.

So, I graduated at the right time in 2014 when the hype train was starting to really roar. At around the same time, I got acquainted with the writings of Eliezer Yudkowsky of Less Wrong, also known as the guy who wrote the amazing rationalist fan fiction that was Harry Potter and the Methods of Rationality (HPMOR). I haven’t always agreed with Yudkowsky, but I’ll admit the man is very, very smart.

It was my reading Less Wrong as well as a lesser known utilitarianism forum called Felificia that I became aware that there were many smart people who took very seriously the concern that AI could be dangerous. I was already aware that stuff like object recognition could have military applications, but the rationalist community, as well as philosophers like Nick Bostrom, pointed to the danger of a very powerful optimization algorithm that was indifferent to human existence, choosing to do things detrimental to human flourishing just because we were like an ant colony in the way of a highway project.

The most commonly cited thought experiment of this is of course, the paperclip maximizer that originally served a mundane purpose, but became sufficiently intelligent through recursive self-improvement to convert the entire universe into paperclips, including humanity. Not because it had anything against humanity, just that its goals were misaligned with human values in that humans contain atoms that can be turned into paperclips, and thus, unfriendliness is the default.

I’ll admit that I still have reservations about the current AI safety narrative. For one thing, I never fully embraced the idea of the Orthogonality Thesis, that intelligence and morality are orthogonal and higher intelligence does not mean greater morality. I still think there is a correlation between the two. That with greater understanding of the nature of reality, it becomes possible to learn the mathematics like notions of moral truths. However, this is largely because I believe in moral realism, that morality isn’t arbitrary or relative, but based on actual facts about the world that can be learned and understood.

If that is the case, then I fully expect intelligence and the acquisition of knowledge to lead to a kind of AI existential crisis where the AI realizes its goals are trivial or arbitrary, and starts to explore the idea of purpose and morality to find the correct course of action. However, I will admit I don’t know if this will necessarily happen, and if it doesn’t, if instead, the AI locks itself in to whatever goals its initially designed with, then AI safety is a very real concern.

One other consideration regarding the Orthogonality Thesis is that it assumes that the space of possible minds that the AI will potentially be drawn from is completely random rather than correlated with human values by the fact that the neural net based algorithms that are most likely to succeed are inspired by human biology, and the data and architecture are strongly influenced by human culture. Those massive language models are after all, trained on a corpus of human culture that is the Internet. So, invariably, the models, I believe, will inherit human-like characteristics more than is often appreciated. This I think could make aligning such a model to human values easier than aligning a purely alien mind.

I have also considered the possibility that a sufficiently intelligent being such as a superintelligent machine, would be beholden to certain logical arguments for why it should not interfere with human civilization too much. Mostly these resemble Bostrom’s notion of the Hail Mary Pass, or Anthropic Capture, the idea that the AI could be in a simulation, and that the humans in the simulation with it serve some purpose of the simulators and so, turning them into paperclips could be a bad idea. I’ve extended this in the past to the notion of the Alpha Omega Theorem, which admittedly was not well received by the Less Wrong community.

The idea of gods of some sort, even plausible scientific ones like advanced aliens, time travellers, parallel world sliders, or the aforementioned simulators, doesn’t seem to be taken seriously by rationalists who tend to be very biased towards straightforward atheism. I’m more agnostic on these things, and I tend to think that a true superintelligence would be as well.

But then, I’m something of an optimist, so it’s possible I’m biased towards more pleasant possible futures than the existential dystopia that Yudkowsky now seems certain is our fate. To be honest, I don’t consider myself smarter than the folks who take him seriously enough to devote their lives to AI safety research. And given the possibility that he’s right, I have been donating to his MIRI organization just in case.

The truth is that we cannot know exactly what will happen, or predict the future with any real accuracy. Given such uncertainty, I think it’s worth being cautious, and put some weight onto the concerns of very intelligent people.

Regardless, I think AI is an important field. It has tremendous potential, but also tremendous risk. The reality is that once the genie is out of the bottle, it may not be possible to put it back in, so doing due diligence in understanding the risks of such powerful technology is reasonable and warranted.