AI Agents in Action Foundations for Evaluation and Governance 2025

Page 21 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf

Risk assessment identifies and analyses potential harms, linking evaluation results to oversight. Evaluation establishes how the system performs, whereas risk assessment determines whether the agent and its use present risks that need to be understood, assessed and mitigated. Evaluation provides evidence as to whether the set mitigations are effective and met in implementation. The goal of risk24 assessment is to identify, analyse and prioritize the ways an agent could fail or be misused, estimate likelihood and severity, and determine whether it can operate within acceptable boundaries with appropriate controls. This applies to single agents and multi-agent systems, software- based and embodied deployments, and covers both technical and organizational vulnerabilities.Risk assessment draws on an agent’s defined classification dimensions to identify and analyse potential risks, considering factors such as cybersecurity threats, safety hazards, operational vulnerabilities, legal and regulatory requirements, and stakeholder impacts. It also incorporates evidence from evaluation activities, such as sandbox testing and pilot deployments, including task success rates, error patterns and robustness. To make this process operational, organizations can follow a five-step life cycle that can be scaled to the complexity of the use case. The life cycle outlined in Figure 9 links the outputs of classification and evaluation directly to risk management and progressive governance practices. The following, Table 1, provides an example of how the risk assessment process can be structured in practice.252.3 Risk assessment Foundations for AI agent evaluation and governance – risk assessment life cycle FIGURE 9 Function Predictability Use case Autonomy Environment Authority RoleTool call success Edge case robustness Trust indicatorsCapabilities Task success rate Task completion time Error types And more...Define context Evaluate risks Identify risks Manage risks Analyse risksAccess control Trustworthiness & explainabilityTraceability & identity Monitoring & loggingLegal & compliance Manual redundancyLong-term management Human oversightTesting & validation And more... Evaluation criteriaClassification dimensionsRisk assessment life cycleProgressive governance practices Define the use AI Agents in Action: Foundations for Evaluation and Governance 21
Ask AI what this page says about a topic: