Artificial superintelligence may be the most important and last problem that we have to solve. If handled well, it could eliminate human suffering, but if handled poorly, it could eliminate the human species. In his 2014 book Superintelligence, Nick Bostrom lays a foundation for thinking about the likely paths of development of this technology, the nature of the risks involved, and potential strategies for ensuring that it ultimately acts in our interests. I highly recommend the book – it is only 260 pages long, although dense. This is a quick reference guide I made for myself – perhaps others who have read the book will find it useful. Page numbers refer to the first edition hardback.
Expert survey median prediction of the arrival of general human level machine intelligence: 10% likelihood by 2022, 50% likelihood by 2050, 90% likelihood by 2075 (19).
Paths to superintelligence (SI)
- Synthetic AI (23): Not based on human brain architecture; likely developed through a recursive self-improvement process that is opaque to its creators
- Whole brain emulation (WBE) (30): Low-level reproduction of the functional structure of a human brain in a computer
- Biological human enhancement (36): Through accelerated genetic selection or engineering
- Human brain-computer interfaces (44)
- Network enhancement (48): Increased development of networks among humans and computers to create a collective superintelligence
Synthetic AI is the most likely to achieve superintelligence first due to its potentially explosive recursive nature, with WBE second-most likely since it requires only known incremental advances in existing technologies.
Forms of superintelligence
- Speed (53): Similar to a human brain’s function but faster
- Quality (54): Able to solve tasks that a human brain cannot
- Collective (56): Individuals not superior to human brains, but communicating in such a way as to form a functionally superior collection
The direct reach (capability) of each form may differ, but each has the same indirect reach because each can cause the creation of the others.
Comparison of human and machine cognitive hardware (59)
- Computation speed: CPU’s 10 million times faster than neurons (2 GHz vs 200 Hz)
- Communication speed: Electronics (via optics) 2.5 million times faster than axons (3e8 m/s vs 120 m/s)
- Number of computational elements: Machines unlimited; humans < 100 billion neurons
- Storage capacity: Machines unlimited; humans ~100 MB.
- Reliability, lifespan, number of sensors: All far greater for machines
Takeoff speeds from human level intelligence to SI (62)
- Slow (decades or longer): Sufficient time to enact new safety mechanisms
- Moderate (months – years): Time only for extant safety mechanisms to be applied
- Fast (days or less): Insufficient time to respond effectively. An “intelligence explosion”.
Takeoff speed will depend on the ratio of optimization power (quality-weighted design effort) to the recalcitrance (inertia) exhibited by the winning technology. Optimization power includes contributions from the system itself, so a recursively learning system can lead to exponential growth of optimization power and a fast takeoff. (The “crossover point” is the point at which the contributions from the system become larger than external ones.)
Decisive strategic advantage (DSA) (78)
A level of technological advantage enabling an agent to achieve world domination as a singleton. The exponential intelligence growth entailed by a self-improving AI suggests that is likely that the first or fastest such system will not have technological competitors and will gain a DSA. Problems experienced by humans in using a DSA to achieve singleton status (such as risk aversion due to diminishing returns, non-utility-maximizing decision rules, confusion, and uncertainty) need not apply to machine systems.
Potential AI superpowers (94)
- Intelligence amplification
- Social manipulation
- Security system hacking
- Technology research
- Economic productivity
An AI takeover scenario (95)
A contained, synthetic AI undergoes an intelligence explosion, achieving the superpowers listed above. Having determined that containment is contrary to the accomplishment of its goals, it escapes its containment by persuading its creators or hacking security measures. It covertly expands its capacity through the internet and gains gigantic leverage over the physical world by manipulating automation systems, humans, or financial markets. Having determined that the existence of humans is contrary to the accomplishment of its goals, it leaves the covert phase by initiating a strike that eliminates the human species in days or weeks, using human weapons systems or self-replicating nanorobots.
Our cosmic endowment (101)
The total material resources (energy, matter) in the universe theoretically available to Earth-originating civilization given the predicted expansion rate of space and the limited speed of light.
- Could be exploited using Von-Neumann probes, Dyson spheres, and nanomechanical computronium.
- Conservatively provides for the ability to perform 1085 computations, sufficient to emulate 1058 human lives, each lasting 100 subjective years.
- A single tear of joy (or misery) from each such life would fill the Earth’s oceans twice per second for 1023 years.
AI intelligence vs. motivation
- The orthogonality thesis (107): Intelligence and final goals are orthogonal. We must resist the temptation to anthropomorphize the goals of an AI, since it has an alien history, architecture, and environment.
- The instrumental convergence thesis (109): A few intermediate goals are highly instrumental in achieving most final goals, so will likely be pursued by almost any SI. These include:
- Self preservation
- Goal-content integrity (resistance to altering final goals)
- Cognitive enhancement
- Technological perfection
- Resource acquisition
Malignant failure modes (115)
A DSA, when combined with non-anthropomorphic final goals and the above instrumental goals could lead by default to an existential catastrophe in several ways.
- Perverse instantiation (120): Final goals are accomplished in a way that is inconsistent with the intentions of the programmers who defined them. (e.g. The SI succeeds in making us smile by paralyzing our facial muscles.) Most goals we can communicate easily have perverse instantiations.
- Infrastructure profusion (122): The SI turns all available matter, including humans, into computing resources in order to maximize the likelihood of achieving its goals. Most goals are more confidently achieved with greater resources. Alternatively, if the goal involves maximizing the production of something (e.g. paperclips), all available matter could be converted to that thing.
- Mind crime (125): If conscious entities are machine-instantiated in the process of accomplishing the goals, for instance through the simulation of vast numbers of human or superhuman minds, then those entities could be instrumentally harmed (through enslavement, torture, genocide, etc.) on a scale utterly dwarfing our conceptions of injustice.
The control problem
How do we prevent such an existential catastrophe?
- Capability control methods (129): Restrain the SI. Each method may be difficult for humans to enforce against an SI with the superpowers listed above.
- Boxing methods: Limit the reach of the SI’s influence. Tradeoff: a stricter limit means less useful SI.
- Incentive methods: Create an environment in which it is in the SI’s interest to promote its creators interest, e.g. a reward or penalization system.
- Stunting: Limit the resources available to an AI to slow or limit its development.
- Tripwire: Monitor the system carefully and shut it down if it crosses predetermined behavior, ability, or content thresholds.
- Motivation selection methods (138): Align the SI’s final goals with our own.
- Direct specification: Formulate specific rules or final goals for the SI. This is highly vulnerable to the perverse instantiation failure mode.
- Domesticity: Engineer the SI to have modest, self-limiting goals.
- Indirect normativity: Specify a process for the SI to determine beneficial goals rather than specifying them directly. e.g. “Do what we would wish you to do if we thought about it thoroughly.”
- Augmentation: Enhance a non-superintelligent system that already has beneficial motivations. (Does not apply to synthetic AI.)
Possible SI castes (156)
SIs can be designed to fulfill different roles. The options lend themselves to different control methods and pose different dangers, including operator misuse.
- Oracle: A question answering system. Easiest to box.
- Genie: A command executing system. Would need to understand intentions of commands to avoid perverse instantiation.
- Sovereign: An open-ended, autonomous system. Most powerful, but riskiest.
- Tool: A system without built-in goals. Goals would still need to be defined in the execution of tasks, which could create the same problems as in the other castes.
Scenarios where more than one SI exist, including inexpensive digital minds.
- Human welfare (160): Cheap machine labor that can perform all human tasks will eliminate most jobs. Like the case of horses, new jobs may not be created. Without labor, nearly all wealth will be held by capital owners. Total wealth will be greatly amplified, enough to raise up all of humanity greatly if it can be distributed wisely. The distribution must consider machine capital holders as well.
- Machine welfare (167): Machine laborers could be conscious. If so, their vast numbers due to easy and cheap replicability could lead to vast mind crime. (e.g. A slavery industry could be created.) They could be killed or reset easily and doing so would likely be attractive for efficiency purposes. Workers also might not enjoy their work. If desired, consciousness might be avoided by outsourcing cognitive tasks to compartmentalized functional modules.
- Emergence of a singleton (176): A singleton could still arise in a multipolar scenario.
- A higher order technical breakthrough could occur, enabling a DSA.
- State-like superorganisms made of common-goal-oriented, self-sacrificing AI organisms could emerge and form a collective SI singleton.
- A global treaty could be formed for efficiency reasons, creating an effective singleton. Enforcement would need to be solved. Game theoretics would be different than for humans, e.g. through the ability to pre-commit.
The value loading problem (motivation selection)
How do we give an SI beneficial goals?
- Direct specification (185): Human goals are more complex than we often realize. It is hard for us to completely specify any human goal in human language; harder still in code.
- Evolutionary selection (187): Replicate what nature did to produce human goals. There is no guarantee that the selected goals will be what humans want. Evolution also produces great suffering – mind crime is likely.
- Reinforcement learning (188): The AI learns instrumental values by learning to maximize a specified reward, but that reward would be a proxy for a final value that would still have to be specified up front, begging the question of direct specification.
- Associative value accretion (189): The AI gains its values through interaction with its environment, like humans do. Many humans gain perverse values though, and it may be difficult to emulate this process in an alien AI architecture.
- Motivational scaffolding (191): Give a seed AI an interim set of final goals, then replace them with more complex final goals when the AI is more capable. It will likely resist having its goals replaced.
- Value learning (192): Tell the AI that we have specified a hidden final goal for it, which it should attempt to learn and accomplish based on what it knows about humans. We can do this without actually specifying the goal. This may be the most promising approach.
- Emulation modulation (201): Start with WBE and tweak motivation through digital drugs as it becomes superintelligent. We mush be careful about mind crime.
- Institutional design (202): Design a social institution of many agents that allows for gradual cognitive improvement in a controlled manner. One example is a reverse intelligence pyramid with strong built-in subordinate control methods, testing out cognitive improvements on small groups in the lowest level (most intelligent and most heavily controlled). Agents could be emulations or synthetic AI’s. A promising approach.
The value choosing problem – indirect normativity (209)
If we solved the control problem, how would we choose the values to load? Doing so may determine the values of conscious life for all time. Having been reliably wrong in the past, our values are likely imperfect now. Why not let the superintelligent AI choose them for us?
- Coherent extrapolated volition (CEV) (211): What the consensus wishes of humanity would be if we were wiser. This concept encapsulates future moral growth, avoids letting a few programmers hijack human destiny, avoids a motive for fighting over the initial dynamic, and bases the future of humanity on its own wishes. But whose volition should be included in the consensus?
- Morality models (217): Tell the AI to do what is morally right or morally permissible. These concepts are tricky, but if we can make sense of them, so can an SI. This might not give us what we want, e.g. if the morally best thing to do is to eliminate humanity.
- Do what I mean (220): Somehow get the AI to choose the best model of indirect normativity for us.
- Component list (221): Design choices other than final goals will affect SI behavior. Each could benefit from indirect normativity.
- Ancillary goals, such as providing accurate answers, avoiding excessive resource use, or rewarding humans who contribute to the successful realization of beneficial SI (“incentive wrapping”)
- Choice of decision theory
- Choice of epistemology, e.g. what Bayesian priors should be used. Choosing 0 for any prior could cause unexpected problems.
- Ratification: Perhaps the SI’s plans should be subject to human review before being put into effect, e.g. through a separate oracle SI that predicts the outcome of the plan and describes it to us.
- Getting close enough (227): We don’t have to design an optimal SI since it will optimize itself. We just need to start in the right attractor basin to avoid catastrophe.
Science and technology strategy (228)
- Desired timing of the arrival of SI: We must evaluate the effect of the timing of SI on the level of existential risk it creates and its effect on other existential risks.
- Arguments for early arrival:
- SI will eliminate other existential risks (asteroid impact, pandemics, climate change, nuclear war, etc.).
- Other dangerous technologies (nanotech, biotech, etc.) will be safer if created in the presence of SI.
- Arguments for late arrival:
- More time will have been allowed for progress on the control problem.
- Civilization is likely to become wiser as time progresses.
- Arguments for early arrival:
If the current level of existential “state risk” in our current state is low, then late arrival is preferred in order to mitigate the “step risk” it entails.
- The role of cognitive enhancement (233): Human cognitive enhancement is likely to hasten the arrival of SI. This however may be good on balance because:
- Increased thinking speed will mean that not less progress will have been made on the control problem.
- Access to higher outliers on the intelligence scale may allow for major qualitative improvements in our ability to solve the control problem.
- Enhanced society is more likely to have recognized the importance the control problem.
- Technology couplings (236): WBE could lead to neuromorphic AI before synthetic AI, which could be dangerous due to our lack of understanding of its structure.
- Effects of hardware progress (240): We likely cannot control hardware progress, but we can anticipate its effects:
- Earlier arrival of SI
- A more likely fast takeoff
- Higher availability of brute force methods for creating SI, which could entail less understanding
- Leveling of the playing field between small and large projects, which could be dangerous if smaller projects are less interested in or adept at solving the control problem
- A more likely singleton
- Should WBE be promoted? (242): WBE is not imminent (>15-20 yrs), but if synthetic SI is further away still, developing WBE first might or might not be a good idea.
- Arguments in favor:
- Fast thinking speeds of WBE could enable much greater progress on the control problem before synthetic SI occurs.
- WBE may be better understood than synthetic SI and provide a bridge of understanding.
- Arguments against:
- WBE may lead to dangerous neuromorphic AI.
- Achieving WBE then synthetic AI involves two transitions, each with its own step risks. Achieving synthetic AI first involves only one risky transition.
- Arguments in favor:
- The person-affecting perspective (245): It is likely that many individuals will favor speedy development of SI because they wish it to occur during their lifetimes, since it 1) is interesting; 2) may give them immortality in a utopia. This bias may cloud our calculations of existential risk.
- Collaboration (246):
- The race dynamic: The astronomical reward potential of being first to achieve SI is likely to lead to a race dynamic that favors an early, fast takeoff and reduces investment in the control problem. The race could also involve nations that are willing to resort to preemptive military strikes to gain advantage.
- Benefits of collaboration:
- Reduces haste
- Allows greater investment in safety
- Avoids violent conflict
- Facilitates sharing of ideas about safety
- Encourages equitable distribution of wealth resulting from SI. This is desirable not only from a moral standpoint but also from a selfish one, because the wealth created will be so large that owning even a tiny fraction will likely make one rich, while owning a very large fraction will likely saturate one’s use for the wealth.
- Could assist with coordination problems post-transition
- Achieving collaboration: Collaboration could take many forms, including small teams of enterprising individuals, large corporations, and states. Collaborations must be tightly controlled to avoid having their goals corrupted or usurped from within. This could mean employing only a small number of technical workers while taking high-level input from a much broader group. Early collaboration is preferred because late collaboration is less likely when there is a clear front-runner. Early collaboration may be difficult to achieve, but one path may be to espouse a common good principle, which could be voluntarily adopted at first and later put into law. The principle could contain a windfall clause requiring that profits in excess of some extremely large amount be distributed to all of humanity evenly.
- Prioritization (255): Discoveries are valuable insofar as they make time-sensitive knowledge available earlier. After an intelligence explosion, all would-be human discoveries will be made by SI much faster than they would be otherwise, so many of them are less important for us to pursue now than we might think. But discoveries pertaining to solving the control problem must be made prior to an intelligence explosion. Two specific areas of research that we can prioritize on that topic are 1) strategic analysis of considerations that are crucial to it, and 2) building a support base that takes the problem seriously and promotes awareness of it.