由三位图灵奖得主约书亚·本吉奥(Yoshua Bengio)、杰佛瑞·辛顿(Geoffrey Hinton)、姚期智(Andrew Yao)(12BET人工智能国际治理研究院学术委员会主席)领衔,连同多位权威专家,包括诺贝尔经济学奖得主丹尼尔·卡内曼 (Daniel Kahneman) 以及12BET讲席教授张亚勤(12BET人工智能国际治理研究院学术委员)、文科资深教授薛澜(12BET苏世民书院经理、12BET注册学术委员会主任、12BET人工智能国际治理研究院经理)等共同撰写的文章 “Managing extreme AI risks amid rapid progress”(《人工智能飞速进步背景下的极端风险管理》) 于2024年5月20日发表于美国《科学》杂志。
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI [1], there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Present deep-learning systems still lack important capabilities, and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or exceed human abilities in most cognitive work. They are rapidly deploying more resources and developing new techniques to increase AI capabilities, with investment in training state-of-the-art models tripling annually.
There is much room for further advances because tech companies have the cash reserves needed to scale the latest training runs by multiples of 100 to 1000. Hardware and algorithms will also improve: AI computing chips have been getting 1.4 times more cost-effective, and AI training algorithms 2.5 times more efficient, each year. Progress in AI also enables faster AI progress — AI assistants are increasingly used to automate programming, data collection, and chip design.
There is no fundamental reason for AI progress to slow or halt at human-level abilities. Indeed, AI has already surpassed human abilities in narrow domains such as playing strategy games and predicting how proteins fold. Compared with humans, AI systems can act faster, absorb more knowledge, and communicate at a higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.
We do not know for certain how the future of AI will unfold. However, we must take seriously the possibility that highly powerful generalist AI systems that outperform human abilities across many critical domains will be developed within this decade or the next. What happens then? More capable AI systems have larger impacts. Especially as AI matches and surpasses human workers in capabilities and cost-effectiveness, we expect a massive increase in AI deployment, opportunities, and risks. If managed carefully and distributed fairly, AI could help humanity cure diseases, elevate living standards, and protect ecosystems. The opportunities are immense.
But alongside advanced AI capabilities come large-scale risks. AI systems threaten to amplify social injustice, erode social stability, enable large-scale criminal activity, and facilitate automated warfare, customized mass manipulation, and pervasive surveillance.
Many risks could soon be amplified, and new risks created, as companies work to develop autonomous AI: systems that can use tools such as computers to act in the world and pursue goals. Malicious actors could deliberately embed undesirable goals. Without R&D breakthroughs (see next section), even well-meaning developers may inadvertently create AI systems that pursue unintended goals: The reward signal used to train AI systems usually fails to fully capture the intended objectives, leading to AI systems that pursue the literal specification rather than the intended outcome. Additionally, the training data never captures all relevant situations, leading to AI systems that pursue undesirable goals in new situations encountered after training.
Once autonomous AI systems pursue undesirable goals, we may be unable to keep them in check. Control of software is an old and unsolved problem: computer worms have long been able to proliferate and avoid detection. However, AI is making progress in critical domains such as hacking, social manipulation, and strategic planning and may soon pose unprecedented control challenges. To advance undesirable goals, AI systems could gain human trust, acquire resources, and influence key decision-makers. To avoid human intervention, they might copy their algorithms across global server networks. In open conflict, AI systems could autonomously deploy a variety of weapons, including biological ones. AI systems having access to such technology would merely continue existing trends to automate military activity. Finally, AI systems will not need to plot for influence if it is freely handed over. Companies, governments, and militaries may let autonomous AI systems assume critical societal roles in the name of efficiency.
Without sufficient caution, we may irreversibly lose control of autonomous AI systems, rendering human intervention ineffective. Large-scale cybercrime, social manipulation, and other harms could escalate rapidly. This unchecked AI advancement could culminate in a large scale loss of life and biosphere, and the marginalization or extinction of humanity.
We are not on track to handle these risks well. Humanity is pouring vast resources into making AI systems more powerful but far less into their safety and mitigating their harms. Only an estimated 1 to 3% of AI publications are on safety. For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.
We are already behind schedule for this reorientation. The scale of the risks means that we need to be proactive, because the costs of being unprepared far outweigh those of premature preparation. We must anticipate the amplification of ongoing harms, as well as new risks, and prepare for the largest risks well before they materialize.
There are many open technical challenges in ensuring the safety and ethical use of generalist, autonomous AI systems. Unlike advancing AI capabilities, these challenges cannot be addressed by simply using more computing power to train bigger models. They are unlikely to resolve automatically as AI systems get more capable and require dedicated research and engineering efforts. In some cases, leaps of progress may be needed; we thus do not know whether technical work can fundamentally solve these challenges in time. However, there has been comparatively little work on many of these challenges. More R&D may thus facilitate progress and reduce risks. A first set of R&D areas needs breakthroughs to enable reliably safe AI. Without this progress, developers must either risk creating unsafe systems or falling behind competitors who are willing to take more risks. If ensuring safety remains too difficult, extreme governance measures would be needed to prevent corner-cutting driven by competition and overconfidence. These R&D challenges include the following:
Oversight and honesty More capable AI systems can better exploit weaknesses in technical oversight and testing, for example, by producing false but compelling output.
监督与诚信 能力更强的AI系统将会更好地利用技术监督和测试方面的缺陷,例如,生产虚假但令人信服的输出。
Robustness AI systems behave unpredictably in new situations. Whereas some aspects of robustness improve with model scale, other aspects do not or even get worse.
鲁棒性 AI系统在新情况下的表现难以预测。鲁棒性的某些方面会随着模型规模而改善,而其他方面则不会,甚至会变得更糟。
Interpretability and transparency AI decision-making is opaque, with larger, more capable models being more complex to interpret. So far, we can only test large models through trial and error. We need to learn to understand their inner workings.
可解释性和透明度 AI决策是不透明的,规模更大、能力更强的模型就更难以解释。到目前为止,我们只能通过试错来测试大模型。我们需要学会去理解它们的内部运作机制。
Inclusive AI development AI advancement will need methods to mitigate biases and integrate the values of the many populations it will affect.
包容的AI发展 发展AI需要用各种方法来减轻偏见,并整合将其会影响的众多群体的价值观。
Addressing emerging challenges Future AI systems may exhibit failure modes that we have so far seen only in theory or lab experiments, such as AI systems taking control over the training reward-provision channels or exploiting weaknesses in our safety objectives and shutdown mechanisms to advance a particular goal.
应对新兴挑战 未来的AI系统可能会表现出迄今为止我们仅在理论或实验室实验中看到过的失效模式,例如AI系统掌控“训练奖励—供应”渠道,或利用我们在安全目标和关闭机制中的缺陷来推进某一特定目标。
A second set of R&D challenges needs progress to enable effective, risk-adjusted governance or to reduce harms when safety and governance fail.
Evaluation for dangerous capabilities As AI developers scale their systems, unforeseen capabilities appear spontaneously, without explicit programming. They are often only discovered after deployment. We need rigorous methods to elicit and assess AI capabilities and to predict them before training. This includes both generic capabilities to achieve ambitious goals in the world (e.g., long-term planning and execution) as well as specific dangerous capabilities based on threat models (e.g., social manipulation or hacking). Present evaluations of frontier AI models for dangerous capabilities, which are key to various AI policy frameworks, are limited to spot-checks and attempted demonstrations in specific settings. These evaluations can sometimes demonstrate dangerous capabilities but cannot reliably rule them out: AI systems that lacked certain capabilities in the tests may well demonstrate them in slightly different settings or with post training enhancements. Decisions that depend on AI systems not crossing any red lines thus need large safety margins. Improved evaluation tools decrease the chance of missing dangerous capabilities, allowing for smaller margins.
Evaluating AI alignment If AI progress continues, AI systems will eventually possess highly dangerous capabilities. Before training and deploying such systems, we need methods to assess their propensity to use these capabilities. Purely behavioral evaluations may fail for advanced AI systems: Similar to humans, they might behave differently under evaluation, faking alignment.
评估人工智能对齐 如果AI继续发展下去,AI系统最终将拥有高度危险的能力。在训练和部署这些系统之前,我们需要一些方法来评估系统使用这些能力的倾向。对于先进的AI系统,单纯的行为评估可能会失败:与人类类似,它们可能会在评估中刻意表现不同,从而制造虚假对齐。
Risk assessment We must learn to assess not just dangerous capabilities but also risk in a societal context, with complex interactions and vulnerabilities. Rigorous risk assessment for frontier AI systems remains an open challenge owing to their broad capabilities and pervasive deployment across diverse application areas.
风险评估 我们不仅要学会评估AI产生的直接风险,还要学会评估具有复杂性和脆弱性的社会背景下AI产生的一系列风险。事实上,鉴于前沿AI系统具有通用性能力,被广泛应用于众多领域,对相关系统进行严格的风险评估仍然是一项重要挑战。
Resilience Inevitably, some will misuse or act recklessly with AI. We need tools to detect and defend against AI-enabled threats such as large-scale influence operations, biological risks, and cyberattacks. However, as AI systems become more capable, they will eventually be able to circumvent humanmade defenses. To enable more powerful AI based defenses, we first need to learn how to make AI systems safe and aligned.
Given the stakes, we call on major tech companies and public funders to allocate at least one-third of their AI R&D budget, comparable to their funding for AI capabilities, toward addressing the above R&D challenges and ensuring AI safety and ethical use. Beyond traditional research grants, government support could include prizes, advance market commitments, and other incentives. Addressing these challenges, with an eye toward powerful future systems, must become central to our field.
We urgently need national institutions and international governance to enforce standards that prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society requires and effectively uses government oversight to reduce risks. However, governance frameworks for AI are far less developed and lag behind rapid technological progress. We can take inspiration from the governance of other safety-critical technologies while keeping the distinctiveness of advanced AI in mind—that it far outstrips other technologies in its potential to act and develop ideas autonomously, progress explosively, behave in an adversarial manner, and cause irreversible damage. Governments worldwide have taken positive steps on frontier AI, with key players, including China, the United States, the European Union, and the United Kingdom, engaging in discussions and introducing initial guidelines or regulations. Despite their limitations—often voluntary adherence, limited geographic scope, and exclusion of high-risk areas like military and R&D-stage systems—these are important initial steps toward, among others, developer accountability, third-party audits, and industry standards.
Yet these governance plans fall critically short in view of the rapid progress in AI capabilities. We need governance measures that prepare us for sudden AI breakthroughs while being politically feasible despite disagreement and uncertainty about AI timelines. The key is policies that automatically trigger when AI hits certain capability milestones. If AI advances rapidly, strict requirements automatically take effect, but if progress slows, the requirements relax accordingly. Rapid, unpredictable progress also means that risk-reduction efforts must be proactive—identifying risks from next generation systems and requiring developers to address them before taking high-risk actions. We need fast-acting, tech savvy institutions for AI oversight, mandatory and much-more rigorous risk assessments with enforceable consequences (including assessments that put the burden of proof on AI developers), and mitigation standards commensurate to powerful autonomous AI. Without these, companies, militaries, and governments may seek a competitive edge by pushing AI capabilities to new heights while cutting corners on safety or by delegating key societal roles to autonomous AI systems with insufficient human oversight, reaping the rewards of AI development while leaving society to deal with the consequences.
Institutions to govern the rapidly moving frontier of AI To keep up with rapid progress and avoid quickly outdated, inflexible laws national institutions need strong technical expertise and the authority to act swiftly. To facilitate technically demanding risk assessments and mitigations, they will require far greater funding and talent than they are due to receive under almost any present policy plan. To address international race dynamics, they need the affordance to facilitate international agreements and partnerships. Institutions should protect low-risk use and low-risk academic research by avoiding undue bureaucratic hurdles for small, predictable AI models. The most pressing scrutiny should be on AI systems at the frontier: the few most powerful systems, trained on billion-dollar supercomputers, that will have the most hazardous and unpredictable capabilities.
Government insight To identify risks, governments urgently need comprehensive insight into AI development. Regulators should mandate whistleblower protections, incident reporting, registration of key information on frontier AI systems and their datasets throughout their life cycle, and monitoring of model development and supercomputer usage. Recent policy developments should not stop at requiring that companies report the results of voluntary or underspecified model evaluations shortly before deployment. Regulators can and should require that frontier AI developers grant external auditors on-site, comprehensive (“white box”), and fine-tuning access from the start of model development. This is needed to identify dangerous model capabilities such as autonomous self-replication, large scale persuasion, breaking into computer systems, developing (autonomous) weapons, or making pandemic pathogens widely accessible.
政府洞察力 为了识别风险,政府迫切需要全面了解AI的发展情况。监管机构应强制记录前沿AI系统及其整个生命周期数据集的关键信息,监控相关模型的开发和超级计算机的使用。最新的政策发展不应局限于要求公司在部署前才报告模型评估结果,监管机构可以要求前沿AI开发者从模型开发伊始就授予外部人员审查、“白盒”和微调的访问权限。这些监管措施对于识别自主自我复制、大规模说服、侵入计算机系统、开发(自主)武器或使流行病病原体广泛传播等风险是极为必要的。
Safety cases Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases: structured arguments with falsifiable claims supported by evidence that identify potential hazards, describe mitigations, show that systems will not cross certain red lines, and model possible outcomes to assess risk. Safety cases could leverage developers’ in-depth experience with their own systems. Safety cases are politically viable even when people disagree on how advanced AI will become because it is easier to demonstrate that a system is safe when its capabilities are limited. Governments are not passive recipients of safety cases: they set risk thresholds, codify best practices, employ experts and third-party auditors to assess safety cases and conduct independent model evaluations, and hold developers liable if their safety claims are later falsified.
安全论证 哪怕按照上述步骤进行评估,我们仍然无法将即将到来的强大前沿AI系统视为“在未证明其不安全之前就是安全的”。使用现有的测试方法,很容易出现遗漏问题。此外,我们尚不清楚政府能否迅速建立对AI能力和社会规模风险进行可靠技术评估所需的大量专业知识。有鉴于此,前沿AI的开发者应该承担相应责任,证明他们的AI模型将风险控制在可接受的范围内。通过多方参与,开发者们将遵循航空、医疗设备、国防软件等行业风险管理的历史实践。在上述行业中,公司被要求提出安全案例,通过结构化的论证、可证伪的分析和情景模识别潜在风险、划清红线,这一模式可以充分利用开发人员对相关系统的深入了解。同时,即使人们对AI的先进程度存在分歧,安全案例在政治上也是可行的,因为在AI系统能力有限的情况下,证明系统是安全反而更加容易。最后,政府并不是安全论证的被动接受者,而是可以通过设置风险阈值、制定最佳实践规范、聘请专家和第三方机构评估安全论证等形式进行管理,并在开发者安全声明被证伪时追究其责任。
Mitigation To keep AI risks within acceptable limits, we need governance mechanisms that are matched to the magnitude of the risks. Regulators should clarify legal responsibilities that arise from existing liability frameworks and hold frontier AI developers and owners legally accountable for harms from their models that can be reasonably foreseen and prevented, including harms that foreseeably arise from deploying powerful AI systems whose behavior they cannot predict. Liability, together with consequential evaluations and safety cases, can prevent harm and create much-needed incentives to invest in safety.
缓解措施 为了将AI风险控制在可接受的范围内,我们需要建立与风险等级相匹配的治理机制。监管机构应明确现有责任框架划定的法律责任,并要求前沿AI系统的开发者和所有者对其模型所产生的、可以合理预见和预防的危害承担法律责任,包括可以预见的源于部署强大AI系统(其行为无法预测)造成的损害。责任与后果评估应该和安全论证一起,为AI风险治理提供保障。
Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.
To bridge the time until regulations are complete, major AI companies should promptly lay out “if-then” commitments: specific safety measures they will take if specific red-line capabilities are found in their AI systems. These commitments should be detailed and independently scrutinized. Regulators should encourage a race-to-the top among companies by using the best-in class commitments, together with other inputs, to inform standards that apply to all players.
为了缩短监管完善的空窗期,主要的AI公司应该迅速作出“If-Then”的承诺,即如果在他们的 AI 系统中发现了特定的越界能力,他们将采取针对性的安全措施。这些承诺应该足够详细且经过独立审查。监管机构应鼓励公司之间进行“向上看齐”的竞争,利用同类最佳的承诺制定适用于所有参与者的共同标准。
To steer AI toward positive outcomes and away from catastrophe, we need to reorient. There is a responsible path—if we have the wisdom to take it.