AI Agents in Action Foundations for Evaluation and Governance 2025
Page 12 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
2
A structured foundation for evaluating and
governing AI agents enables consistent
assessment and oversight across contexts.
As AI agents mature and adoption increases, a
functional understanding of their roles and properties
is beginning to take shape. Rather than classifying
agents solely by modality (e.g. text, speech, vision)
or domain (e.g. customer service, decision support,
workflow orchestration), it is more effective to
evaluate them according to their intended purpose,
core properties and operating context. This approach
creates a clearer foundation for assessing impacts
and designing safeguards that are proportionate to
an agent’s role. Systematic classification is important
because it provides a common basis for comparing
agents, anticipating risks and linking evaluation
and governance decisions to the realities of how
an agent operates. Without it, oversight risks may
become inconsistent, reactive or disconnected from
an agent’s actual capabilities and environment.
To establish this foundation, this report
introduces four foundational pillars which,
in combination, provide a structured
approach to assessment and adoption:
–Classification: Establish the agent’s
characteristics and operational context
to inform downstream assessment.
–Evaluation: Generate evidence of performance
and limitations in representative settings.
–Risk assessment: Analyse potential harm
using classification and evaluation as inputs.
–Governance: Translate classification,
evaluation and risk assessment results
into safeguards and accountability
proportionate to the agent’s profile.
These foundations apply to diverse AI agents,
encompassing both virtual and embodied
systems in different operational contexts.
They provide a consistent basis for assessing performance, identifying risks and establishing
governance mechanisms that scale with an
agent’s autonomy, authority and function.
To address classification, evaluation, risk assessment
and governance, it is useful to distinguish between
two main stakeholder perspectives:13
–Provider: Refers to organizations or individuals
that supply AI systems, platforms or tools. Their
responsibilities include ensuring that products
are developed and maintained in accordance
with responsible and ethical guidelines, and
that the necessary documentation and support
are provided.
–Adopter: Refers to individuals within an
organization who use AI systems, encompassing
responsibilities such as procurement and
deployment. Procurement involves the
responsibility of acquiring AI solutions for
organizational use by conducting due diligence
and ensuring that all AI agent solutions comply
with organizational policies and regulatory
requirements. Deployment is the responsibility
for implementing AI systems in accordance with
documented requirements and plans, while
ensuring that risks and impacts of the AI agent
are properly assessed and managed.
The adopter depends on the provider for
transparent documentation, model and system
specifications, and sufficient performance and risk
information to support responsible deployment and
oversight throughout the system life cycle.
The four pillars form a continuous and parallel
progression in which classification provides
structure, evaluation establishes evidence, risk
assessment identifies and mitigates potential
harms, and governance translates those insights
into safeguards and accountability.Foundations for AI agent
evaluation and governance
Systematic
classification is
important because
it provides a
common basis for
comparing agents,
anticipating risks
and linking
evaluation and
governance.
AI Agents in Action: Foundations for Evaluation and Governance
12
Ask AI what this page says about a topic: