Rapid advances in artificial intelligence (AI) over the past decade have been accompanied by several high-profile failures, highlighting the importance of ensuring that intelligent machines are beneficial to humanity. This realization has given rise to the new subfield of research known as AI safety and security, which encompasses a wide range of research areas and has seen a steady growth in publications in recent years.
The underlying assumption in this research is that the problem of controlling highly capable intelligent machines is solvable. But no rigorous mathematical proof or argumentation has been presented to demonstrate that the AI control problem is solvable in principle, let alone in practice. In computer science, it is standard to first determine whether a problem belongs to a class of “unsolvable” problems before investing resources in trying to solve it.
Despite the recognition that the problem of AI control may be one of the most important problems facing humanity, it remains poorly understood, poorly defined and poorly researched. A computer science problem could be either solvable, unsolvable, undecidable or partially solvable. But we don’t know the actual status of the AI control problem. It is possible that some forms of control may be possible in certain situations, but it is also possible that partial control may be insufficient in many cases. Without a better understanding of the nature and feasibility of the AI control problem, it is difficult to determine an appropriate course of action.
The AI control problem
We define the problem of AI control as: How can humanity remain safely in control while benefiting from a superior form of intelligence? This is the fundamental problem in the field of AI safety and security, which aims to make intelligent systems safe from tampering and secure for all stakeholders involved.
Value alignment is the most studied approach to achieve security in AI. However, concepts such as “safety” and “security” are notoriously difficult to test or measure accurately, even for non-AI software, despite years of research. At best, we can probably distinguish between “perfectly safe” and “as safe as an average person performing a similar task.”
However, society is unlikely to tolerate machine errors, even if they occur with a frequency typical of human performance or less frequently. We expect machines to perform better and will not accept partial safety when dealing with such highly capable systems. The impact of AI (both positive and negative) is strongly related to its capability. With respect to possible existential impacts, there is no such thing as partial safety.
An initial understanding of the control problem may suggest designing a machine that accurately follows human commands. However, because of possible conflicting or paradoxical commands, ambiguity of human languages and perverse instantiation problems, this is not a desirable form of control (although some ability to integrate human feedback may be desirable). The solution is thought to require AI to act in the capacity of an ideal advisor, avoiding the problems of misinterpretation of direct commands and the possibility of malevolent commands.
Some argue that the consequences of an uncontrolled AI could be so severe (everyone is killed, or worse, everyone lives forever but is tortured) that even if there is a very small chance of a hostile AI emerging, it is still worthwhile to conduct AI safety research because the negative utility of such an AI would be astronomical. The common logic is that an extremely high (negative) utility multiplied by a small chance of the event still results in a large disutility and should be taken very seriously. But the chances of a misaligned AI are not small. In fact, in the absence of an effective safety program, that is the only outcome we will get.
The statistics therefore look very compelling in support of a major AI safety effort. We are looking at an almost guaranteed event with the potential to cause an existential catastrophe. This is not a low-risk, high-reward scenario, but a high-risk, negative-reward situation. No wonder many consider this to be the most important problem humanity has ever faced!
The outcome could be prosperity or extinction, and the fate of the universe hangs in the balance. A proof of the solvability or non-solvability of the AI control problem would be the most important proof ever.
Obstacles to controlling AI
Controlling an artificial general intelligence (AGI) is likely to require a toolbox with certain capabilities, such as explainability, predictability and model verifiability. But it is likely that many of the desired tools are not available to us.
- The concept of “unexplainability” in AI refers to the impossibility of providing an explanation for certain decisions made by an intelligent system that is 100 percent accurate and understandable. A complementary concept to unexplainability, incomprehensibility of AI, addresses the inability of people to fully understand an explanation provided by an AI.
- “Unpredictability” of AI, one of the many impossibility outcomes in AI safety, also known as unknowability, is defined as our inability to accurately and consistently predict what specific actions an intelligent system will take to achieve its goals, even if we know the ultimate goals of the system.
- “Non-verifiability” is a fundamental limitation in the verification of mathematical proofs, computer software, intelligent agent behavior and all formal systems. It is becoming increasingly obvious that just as we can only have probabilistic confidence in the correctness of mathematical proofs and software implementations, our ability to verify intelligent agents is at best limited.
Many researchers assume that the problem of AI control can be solved despite the absence of any evidence or proof. Before embarking on a quest to build controlled AI, it is important to demonstrate that the problem can be solved so as not to waste valuable resources.
The burden of proof is on those who claim that the problem is solvable, and the current absence of such proof speaks loudly about the inherent dangers of the proposal to develop AGI. In fact, uncontrollability of AI is very likely to be the case, as can be demonstrated by reduction to the problem of human control.
There are many open questions to consider regarding the issue of controllability, such as: Can the control problem be solved? Can it be done in principle? Can it be done in practice? Can it be done with a sufficient level of accuracy? How long would it take to do it? Can it be done in time? What are the energy and computational requirements to do it?
There seems to be a lack of published evidence to conclude that a less intelligent agent can indefinitely maintain control over a more intelligent agent. As we develop intelligent systems that are less intelligent than we are, we can maintain control, but once such systems become more intelligent than we are, we lose that ability.
Moreover, the problem of controlling such more capable intelligences only becomes more challenging and more obviously impossible for agents with only a static level of intelligence. Currently, it appears that our ability to produce intelligent software far outstrips our ability to control or even verify it.
Instead of asking “What can AI do for us?” we should be asking “What can AI do to us?”
Dr. Roman V. Yampolskiy is a tenured faculty member in the department of Computer Science and Engineering at the University of Louisville. He is the founding and current director of the Cyber Security Lab and an author of many books, including “Artificial Superintelligence: A Futuristic Approach.”