Artificial intelligence as a future existential threat: myths and misconceptions

Today’s post could vaguely be described as “a dip in the waters of rational eschatology“. It sounds metaphorically cool more than it is actually (approximately) accurate, but lie-to-children and all that so here goes.

The academic discussion of existential threats to humanity’s future has been developed for quite some time now in a rather large body of literature. Here’s a good introduction to the subject by Bostrom (one of the leaders of the field) which introduces the idea of global catastrophic risks for the first time and urges identification and prevention of these risks as a global priority; here’s a good FAQ on existential risks (including such questions as “haven’t people in the past often predicted the end of the world?”, shouldn’t we focus on helping the people who exist now and are in need instead?”, and “why should I care at all?”); here’s an entire chapter available online for free that explains what’s meant by “global catastrophic risks” more; here’s a sort-of centralized repository of papers and articles for those interested to know more; here’s the Wikipedia article on global catastrophic risks if you’re the Wiki kind of person (it’s impressively comprehensive).

A recent conversation — “The Myth of AI” — is framed in part as a discussion of points raised in Bostrom’s Superintelligenceand as a response to much-repeated comments by Elon Musk and Stephen Hawking that seem to have been heavily informed by Superintelligence.

Unfortunately, some of the participants fall prey to common misconceptions about the standard case for AI as an existential risk, and they probably haven’t had time to read Superintelligence yet.

I normally love Edge for its incisive discussion of matters that matter by the smartest and most well-informed minds around. Unfortunately, as has often been the case when I go too deeply into a subject, I find out that these otherwise brilliant people (George Dyson! Steven Pinker! Lawrence Krauss!) are often not informed enough about the subject matter – and then go about making authoritative assertions about it on the strength of status points earned by e.g. being intellectual leaders in other fields – that reading them becomes an exercise in restraining frustration. They get basic stuff just wrong, and they reiterate these misconceptions (just in different guises, propped by examples from their own fields/personal observations), and as a result the conversation never goes as far as I’d like it to be, because they’re obviously capable of contributing more had they just done their homework and read (say) literature surveys, or even summaries of literature surveys, or something.

That’s all I ask: do your homework first, please! You’re better than this! Because it feels like they got their impressions of AI-as-existential-risk from Terminator and wrapped them up in a veneer of sophistication and argued against that. The fact that they never address the content of Bostrom’s Superintelligence (seem unaware of vast tracts of it, in fact) when arguing against it is just jarring: “Oh, I never really read your book, but people who read it say it’s X and reminds them of Y and also sounds like Z, and here’s why X, Y and Z are wrong….” It is offensive.

But I digress.

We have for instance Jaron Lanier, who writes in the introduction to the conversation:

The idea that computers are people has a long and storied history. It goes back to the very origins of computers, and even from before. There’s always been a question about whether a program is something alive or not since it intrinsically has some kind of autonomy at the very least, or it wouldn’t be a program. There has been a domineering subculture—that’s been the most wealthy, prolific, and influential subculture in the technical world—that for a long time has not only promoted the idea that there’s an equivalence between algorithms and life, and certain algorithms and people, but a historical determinism that we’re inevitably making computers that will be smarter and better than us and will take over from us. …That mythology, in turn, has spurred a reactionary, perpetual spasm from people who are horrified by what they hear. You’ll have a figure say, “The computers will take over the Earth, but that’s a good thing, because people had their chance and now we should give it to the machines.” Then you’ll have other people say, “Oh, that’s horrible, we must stop these computers.” Most recently, some of the most beloved and respected figures in the tech and science world, including Stephen Hawking and Elon Musk, have taken that position of: “Oh my God, these things are an existential threat. They must be stopped.”

In the history of organized religion, it’s often been the case that people have been disempowered precisely to serve what was perceived to be the needs of some deity or another, where in fact what they were doing was supporting an elite class that was the priesthood for that deity. … That looks an awful lot like the new digital economy to me, where you have (natural language) translators and everybody else who contributes to the corpora that allows the data schemes to operate, contributing to the fortunes of whoever runs the computers. You’re saying, “Well, but they’re helping the AI, it’s not us, they’re helping the AI.” It reminds me of somebody saying, “Oh, build these pyramids, it’s in the service of this deity,” and, on the ground, it’s in the service of an elite. It’s an economic effect of the new idea. The new religious idea of AI is a lot like the economic effect of the old idea, religion.

Perhaps I’m doing Lanier a disservice by only quoting him in part, so don’t take my word for it and go read the entire conversation for yourself; but most of the conversation features contributions that have that kind of flavor, so it never really goes anywhere.

Here’s Luke Muehlhauser doing his low-level bush-clearing thing.

Of course, some of the participants may be responding to arguments they’ve heard from others, even if they’re not part of the arguments typically made by FHI and MIRI. Still, for simplicity I’ll reply from the perspective of the typical arguments made by FHI and MIRI.1

1. We don’t think AI progress is “exponential,” nor that human-level AI is likely ~20 years away.

Lee Smolin writes:

I am puzzled by the arguments put forward by those who say we should worry about a coming AI, singularity, because all they seem to offer is a prediction based on Moore’s law.

That’s not the argument made by FHI, MIRI, or Superintelligence.

Some IT hardware and software domains have shown exponential progress, and some have not. Likewise, some AI subdomains have shown rapid progress of late, and some have not. And unlike computer chess, most AI subdomains don’t lend themselves to easy measures of progress, so for most AI subdomains we don’t even have meaningful subdomain-wide performance data through which one might draw an exponential curve (or some other curve).

Thus, our confidence intervals for the arrival of human-equivalent AI tend to be very wide, and the arguments we make for our AI timelines are fox-ish (in Tetlock’s sense).

I should also mention that — contrary to common belief — many of us at FHI and MIRI, including myself and Bostrom, actually have later timelines for human-equivalent AI than do the world’s top-cited living AI scientists:

A recent survey asked the world’s top-cited living AI scientists by what year they’d assign a 10% / 50% / 90% chance of human-level AI (aka AGI), assuming scientific progress isn’t massively disrupted. The median reply for a 10% chance of AGI was 2024, for a 50% chance of AGI it was 2050, and for a 90% chance of AGI it was 2070. So while AI scientists think it’s possible we might get AGI soon, they largely expect AGI to be an issue for the second half of this century.

Compared to AI scientists, Bostrom and I think more probability should be placed on later years. As explained elsewhere:

We advocate more work on the AGI safety challenge today not because we think AGI is likely in the next decade or two, but because AGI safety looks to be an extremely difficult challenge — more challenging than managing climate change, for example — and one requiring several decades of careful preparation.

The greatest risks from both climate change and AI are several decades away, but thousands of smart researchers and policy-makers are already working to understand and mitigate climate change, and only a handful are working on the safety challenges of advanced AI. On the present margin, we should have much less top-flight cognitive talent going into climate change mitigation, and much more going into AGI safety research.

2. We don’t think AIs will want to wipe us out. Rather, we worry they’ll wipe us out because that is the most effective way to satisfy almost any possible goal function one could have.

Steven Pinker, who incidentally is the author of two of my all-time favorite books, writes:

[one] problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems.

I’m glad Pinker agrees with what Bostrom calls “the orthogonality thesis”: that intelligence and goals are orthogonal to each other.

But our concern is not that superhuman AIs would be megalomaniacal despots. That is anthropomorphism.

Rather, the problem is that taking over the world is a really good idea for almost any goal function a superhuman AI could have. As Yudkowsky wrote, “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.”

Maybe it just wants to calculate as many digits of pi as possible. Well, the best way to do that is to turn all available resources into computation for calculating more digits of pi, and to eliminate potential threats to its continued calculation, for example those pesky humans that seem capable of making disruptive things like nuclear bombs and powerful AIs. The same logic applies for almost any goal function you can specify. (“But what if it’s a non-maximizing goal? And won’t it be smart enough to realize that the goal we gave it wasn’t what we intended if it means the AI wipes us out to achieve it?” Responses to these and other common objections are given in Superintelligence, ch. 8.)

3. AI self-improvement and protection against external modification isn’t just one of many scenarios. Like resource acquisition, self-improvement and protection against external modification are useful for the satisfaction of almost any final goal function.

Kevin Kelly writes:

The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI.

As argued above (and more extensively in Superintelligence, ch. 7), resource acquisition is a “convergent instrumental goal.” That is, advanced AI agents will be instrumentally motivated to acquire as many resources as feasible, because additional resources are useful for just about any goal function one could have.

Self-improvement is another convergent instrumental goal. For just about any goal an AI could have, it’ll be better able to achieve that goal if it’s more capable of goal achievement in general.

Another convergent instrumental goal is goal content integrity. As Bostrom puts it, “An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future.” Thus, it will be instrumentally motivated to prevent external modification of its goals, or of parts of its program that affect its ability to achieve its goals.2

For more on this, see Superintelligence ch. 7.


I’ll conclude with the paragraph in the discussion I most agreed with, by Pamela McCorduck:

Yes, the machines are getting smarter—we’re working hard to achieve that. I agree with Nick Bostrom that the process must call upon our own deepest intelligence, so that we enjoy the benefits, which are real, without succumbing to the perils, which are just as real. Working out the ethics of what smart machines should, or should not do—looking after the frail elderly, or deciding whom to kill on the battlefield—won’t be settled by fast thinking, snap judgments, no matter how heartfelt. This will be a slow inquiry, calling on ethicists, jurists, computer scientists, philosophers, and many others. As with all ethical issues, stances will be provisional, evolve, be subject to revision. I’m glad to say that for the past five years the Association for the Advancement of Artificial Intelligence has formally addressed these ethical issues in detail, with a series of panels, and plans are underway to expand the effort. As Bostrom says, this is the essential task of our century.

P.S. I could have also objected to claims and arguments made in the conversation, for example Lanier’s claim that “The AI component would be only ambiguously there and of little importance [relative to the actuators component].” To me, this is like saying that humans rule the planet because of our actuators, not because of our superior intelligence. Or in response to Kevin Kelly’s claim that “So far as I can tell, AIs have not yet made a decision that its human creators have regretted,” I can for example point to the automated trading algorithms that nearly bankrupted Knight Capital faster than any human could react. But in this piece I will focus instead on claims that seem to be misunderstandings of the positive case that’s being made for AI as an existential risk.

Stuart Russell of UC Berkeley also ends the conversation by covering basically the same points Muehlhauser does above:

Of Myths And Moonshine

We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief.”

So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a “warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine.”

Thus, the gap between authoritative statements of technological impossibility and the “miracle of understanding” (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.

None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore’s law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1.     The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2.     Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.

This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field’s prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.

No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s