AI Agents in Action Foundations for Evaluation and Governance 2025
Page 17 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
Foundations for AI agent evaluation and governance – classification dimensions FIGURE 7
Function Predictability
Use case
Autonomy Environment
Authority RoleTool call
success
Edge case
robustness
Trust indicatorsCapabilities
Task success
rate
Task completion
time
Error types And more...Define context Evaluate risks
Identify risks Manage risks
Analyse risksAccess
control
Trustworthiness &
explainabilityTraceability &
identity
Monitoring &
loggingLegal &
compliance
Manual
redundancyLong-term
management
Human
oversightTesting &
validation
And more...
Define
the useEvaluation
criteriaClassification
dimensionsRisk assessment
life cycleProgressive governance practicesAs agents become more embedded in tools,
platforms and workflows, the proposed dimensions
can help organizations define specific agent roles
and levels of integration while evaluating benefits
and limitations in context and implement oversight
mechanisms that match their capabilities. Taking
these dimensions into consideration can help
providers and adopters to:
–Clarify functional scope: Define what an agent
is designed to do, under what conditions and
where its responsibilities begin and end.
–Support assessment: Evaluate the technical,
organizational, safety and security implications
of deploying specific agents in their contexts. –Guide governance and oversight: Align
safeguards, controls and monitoring
mechanisms with the nature and complexity of
the agent’s role.
–Support interoperability and scaling:
Structure agent types in ways that facilitate
coordination in multi-agent environments and
integration across systems.
Without clear classification, organizations may
adopt AI agents without fully understanding what
they are designed to do, how they operate, the
impact they may have on their environment or
the oversight mechanisms they require. This lack
of clarity could result in gaps in safety, security,
control, privacy, reliability and accountability.
AI Agents in Action: Foundations for Evaluation and Governance
17
Ask AI what this page says about a topic: