Conclusion Agents have already begun moving into production across various domains, including customer support, workflow automation, autonomous research and more. As adoption advances and as early use cases move from single agents to more complex interconnected systems, expectations for scalable oversight grow. This report has outlined the foundations for AI agent evaluation and governance, presenting a conceptual approach to classification, evaluation, risk assessment and governance that supports responsible adoption. The proposed dimensions aim to help organizations better understand what an agent does, how it operates and its place within the broader organization. Evaluation provides evidence of performance and reliability, while risk assessment identifies potential harms and mitigations. Governance helps translate these insights into safeguards and concrete accountability mechanisms, which can then scale as the agent’s capability is extended to more complex use-cases and scenarios.As the development of agents advances towards multi-agent ecosystems, the need for shared protocols, interoperability standards and coordinated oversight is only going to increase. Cross-functional governance that links technical assurance with organizational accountability is considered key to preventing cascading failures and ensuring responsible oversight at scale. At the core of this long-term transition is effective human-AI collaboration. In evolving governance practices, clear responsibility for objectives, supervision, and outcomes must be supported by novel tools and processes that maintain systems as understandable, safe and secure in practice. Ultimately, the responsible deployment of agentic systems depends on a baseline of trust, transparency and accountability that remains valid for all digital systems. With thoughtful design, careful evaluation and proportionate governance, AI agents are likely to amplify human capabilities, improve productivity and, over time, meaningfully contribute to both public and private value. AI Agents in Action: Foundations for Evaluation and Governance 29

AI Agents in Action Foundations for Evaluation and Governance 2025