Navigating the AI Frontier 2024
Page 19 of 28 · WEF_Navigating_the_AI_Frontier_2024.pdf
While AI agents have the potential to offer numerous
benefits, they also come with inherent risks, as
well as novel safety and security implications. For
example, an AI system independently pursuing
misaligned objectives could cause immense
harm, especially in scenarios where the AI agents’
level of autonomy increases while the level of
human oversight decreases. AI agents learning
to deceive human operators, pursuing power-
seeking instrumental goals or colluding with other
misaligned agents in unexpected ways could pose
entirely novel risks.35
Agent-specific risks can be both technical and
normative. Challenges associated with AI agents
stem from technical limitations, ethical concerns
and broader societal impacts often associated
with a system’s level of autonomy and the overall
potential of its use when humans are removed
from the loop. Without a human in the loop at
appropriate steps, agents may take multiple
consequential actions in rapid succession, which
could have significant consequences before a
person notices what is happening.36
AI agents can also amplify known risks associated
with the domain of AI and could introduce entirely
new risks that can be broadly categorized into
technical, socioeconomic and ethical risks.
Technical risks
Examples of technical risks include:
–Risks from malfunctions due to AI agent
failures: AI agents can amplify the risks from
malfunctions by introducing new classes of
failure modes. LLMs, for example, can enable
agents to produce highly plausible but incorrect
outputs, presenting risks in ways that were
not possible with earlier technologies. These
emerging failure modes add to traditional issues
such as inaccurate sensors or effectors and
encompass capability- and goal-related failures,
as well as increased security vulnerabilities that
could lead to malfunctions.37
Capability failures occur when an AI agent fails
to perform the tasks it was designed for, due to
limitations in its ability to understand, process
or execute the required actions. Goal-related
failures occur when a system is highly capable
but nevertheless pursues the wrong goal. These
issues can be caused by:
–Specification gaming: When AI agents
exploit loopholes or unintended shortcuts
in their programming to achieve their
objectives, rather than fulfilling their goals.38 –Goal misgeneralization: When AI agents
apply their learned goals inappropriately to
new or unforeseen situations.39
–Deceptive alignment: When AI agents
appear to be aligned with the intended goals
during training or testing, but their internal
objectives differ from what is intended.40
–Malicious use and security vulnerabilities: AI
agents can amplify the risk of fraud and scams
increasing both in volume and sophistication.
More capable AI agents can facilitate the
generation of scam content at greater speeds
and scale than previously possible, and AI
agents can facilitate the creation of more
convincing and personalized scam content.
For example, AI systems could help criminals
evade security software by correcting language
errors and improving the fluency of messages
that might otherwise be caught by spam
filters.41 More capable AI agents could automate
complex end-to-end tasks that would lower the
point of entry for engaging in harmful activities.
Some forms of cyberattacks could, for example,
be automated, allowing individuals with little
domain knowledge or technical expertise to
execute large-scale attacks.42
–Challenges in validating and testing complex
AI agents: The lack of transparency and non-
deterministic behaviour of some AI agents
creates significant challenges for validation
and verification. In safety-critical applications,
this unpredictability complicates efforts to
assure system safety, as it becomes difficult
to demonstrate reliable performance in all
scenarios.43 While failures in agent-based
systems are expected, the varied ways in which
they can fail adds further complexity to safety
assurance. Failsafe mechanisms are essential
but could be harder to design due to uncertainty
on potential failure modes.44
Socioeconomic risks
Examples of socioeconomic risks include:
–Over-reliance and disempowerment:
Increasing autonomy of AI agents could reduce
human oversight and increase the reliance on AI
agents to carry out complex tasks, even in high-
stakes situations. Malfunctions of the AI agents
due to design flaws or adversarial attacks may
not be immediately apparent if humans are not
in the loop. Additionally, disabling an agent
could be difficult if a user lacks the required
expertise or domain knowledge.45
Pervasive interaction with intelligent AI agents
could also have long-term impacts on individual 3.2 Examples of risks and challenges
Navigating the AI Frontier: A Primer on the Evolution and Impact of AI Agents
19
Ask AI what this page says about a topic: