While AI agents have the potential to offer numerous benefits, they also come with inherent risks, as well as novel safety and security implications. For example, an AI system independently pursuing misaligned objectives could cause immense harm, especially in scenarios where the AI agents’ level of autonomy increases while the level of human oversight decreases. AI agents learning to deceive human operators, pursuing power- seeking instrumental goals or colluding with other misaligned agents in unexpected ways could pose entirely novel risks.35 Agent-specific risks can be both technical and normative. Challenges associated with AI agents stem from technical limitations, ethical concerns and broader societal impacts often associated with a system’s level of autonomy and the overall potential of its use when humans are removed from the loop. Without a human in the loop at appropriate steps, agents may take multiple consequential actions in rapid succession, which could have significant consequences before a person notices what is happening.36 AI agents can also amplify known risks associated with the domain of AI and could introduce entirely new risks that can be broadly categorized into technical, socioeconomic and ethical risks. Technical risks Examples of technical risks include: –Risks from malfunctions due to AI agent failures: AI agents can amplify the risks from malfunctions by introducing new classes of failure modes. LLMs, for example, can enable agents to produce highly plausible but incorrect outputs, presenting risks in ways that were not possible with earlier technologies. These emerging failure modes add to traditional issues such as inaccurate sensors or effectors and encompass capability- and goal-related failures, as well as increased security vulnerabilities that could lead to malfunctions.37 Capability failures occur when an AI agent fails to perform the tasks it was designed for, due to limitations in its ability to understand, process or execute the required actions. Goal-related failures occur when a system is highly capable but nevertheless pursues the wrong goal. These issues can be caused by: –Specification gaming: When AI agents exploit loopholes or unintended shortcuts in their programming to achieve their objectives, rather than fulfilling their goals.38 –Goal misgeneralization: When AI agents apply their learned goals inappropriately to new or unforeseen situations.39 –Deceptive alignment: When AI agents appear to be aligned with the intended goals during training or testing, but their internal objectives differ from what is intended.40 –Malicious use and security vulnerabilities: AI agents can amplify the risk of fraud and scams increasing both in volume and sophistication. More capable AI agents can facilitate the generation of scam content at greater speeds and scale than previously possible, and AI agents can facilitate the creation of more convincing and personalized scam content. For example, AI systems could help criminals evade security software by correcting language errors and improving the fluency of messages that might otherwise be caught by spam filters.41 More capable AI agents could automate complex end-to-end tasks that would lower the point of entry for engaging in harmful activities. Some forms of cyberattacks could, for example, be automated, allowing individuals with little domain knowledge or technical expertise to execute large-scale attacks.42 –Challenges in validating and testing complex AI agents: The lack of transparency and non- deterministic behaviour of some AI agents creates significant challenges for validation and verification. In safety-critical applications, this unpredictability complicates efforts to assure system safety, as it becomes difficult to demonstrate reliable performance in all scenarios.43 While failures in agent-based systems are expected, the varied ways in which they can fail adds further complexity to safety assurance. Failsafe mechanisms are essential but could be harder to design due to uncertainty on potential failure modes.44 Socioeconomic risks Examples of socioeconomic risks include: –Over-reliance and disempowerment: Increasing autonomy of AI agents could reduce human oversight and increase the reliance on AI agents to carry out complex tasks, even in high- stakes situations. Malfunctions of the AI agents due to design flaws or adversarial attacks may not be immediately apparent if humans are not in the loop. Additionally, disabling an agent could be difficult if a user lacks the required expertise or domain knowledge.45 Pervasive interaction with intelligent AI agents could also have long-term impacts on individual 3.2 Examples of risks and challenges Navigating the AI Frontier: A Primer on the Evolution and Impact of AI Agents 19

Navigating the AI Frontier 2024