AI Agents in Action Foundations for Evaluation and Governance 2025

Page 17 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf

Foundations for AI agent evaluation and governance – classification dimensions FIGURE 7 Function Predictability Use case Autonomy Environment Authority RoleTool call success Edge case robustness Trust indicatorsCapabilities Task success rate Task completion time Error types And more...Define context Evaluate risks Identify risks Manage risks Analyse risksAccess control Trustworthiness & explainabilityTraceability & identity Monitoring & loggingLegal & compliance Manual redundancyLong-term management Human oversightTesting & validation And more... Define the useEvaluation criteriaClassification dimensionsRisk assessment life cycleProgressive governance practicesAs agents become more embedded in tools, platforms and workflows, the proposed dimensions can help organizations define specific agent roles and levels of integration while evaluating benefits and limitations in context and implement oversight mechanisms that match their capabilities. Taking these dimensions into consideration can help providers and adopters to: –Clarify functional scope: Define what an agent is designed to do, under what conditions and where its responsibilities begin and end. –Support assessment: Evaluate the technical, organizational, safety and security implications of deploying specific agents in their contexts. –Guide governance and oversight: Align safeguards, controls and monitoring mechanisms with the nature and complexity of the agent’s role. –Support interoperability and scaling: Structure agent types in ways that facilitate coordination in multi-agent environments and integration across systems. Without clear classification, organizations may adopt AI agents without fully understanding what they are designed to do, how they operate, the impact they may have on their environment or the oversight mechanisms they require. This lack of clarity could result in gaps in safety, security, control, privacy, reliability and accountability. AI Agents in Action: Foundations for Evaluation and Governance 17
Ask AI what this page says about a topic: