AI Agents in Action Foundations for Evaluation and Governance 2025
Page 21 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
Risk assessment identifies and analyses
potential harms, linking evaluation results
to oversight.
Evaluation establishes how the system performs,
whereas risk assessment determines whether the
agent and its use present risks that need to be
understood, assessed and mitigated. Evaluation
provides evidence as to whether the set mitigations
are effective and met in implementation.
The goal of risk24 assessment is to identify, analyse
and prioritize the ways an agent could fail or be
misused, estimate likelihood and severity, and
determine whether it can operate within acceptable
boundaries with appropriate controls. This applies
to single agents and multi-agent systems, software-
based and embodied deployments, and covers
both technical and organizational vulnerabilities.Risk assessment draws on an agent’s defined
classification dimensions to identify and analyse
potential risks, considering factors such as
cybersecurity threats, safety hazards, operational
vulnerabilities, legal and regulatory requirements,
and stakeholder impacts. It also incorporates
evidence from evaluation activities, such as
sandbox testing and pilot deployments, including
task success rates, error patterns and robustness.
To make this process operational, organizations can
follow a five-step life cycle that can be scaled to the
complexity of the use case.
The life cycle outlined in Figure 9 links the
outputs of classification and evaluation directly
to risk management and progressive governance
practices. The following, Table 1, provides an
example of how the risk assessment process can
be structured in practice.252.3 Risk assessment
Foundations for AI agent evaluation and governance – risk assessment life cycle FIGURE 9
Function Predictability
Use case
Autonomy Environment
Authority RoleTool call
success
Edge case
robustness
Trust indicatorsCapabilities
Task success
rate
Task completion
time
Error types And more...Define context Evaluate risks
Identify risks Manage risks
Analyse risksAccess
control
Trustworthiness &
explainabilityTraceability &
identity
Monitoring &
loggingLegal &
compliance
Manual
redundancyLong-term
management
Human
oversightTesting &
validation
And more...
Evaluation
criteriaClassification
dimensionsRisk assessment
life cycleProgressive governance practices
Define
the use
AI Agents in Action: Foundations for Evaluation and Governance
21
Ask AI what this page says about a topic: