AI Agents in Action Foundations for Evaluation and Governance 2025
Page 33 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
1. Capgemini Research Institute. (2024). Harnessing the value of generative AI. https://www.capgemini.com/wp-content/
uploads/2024/05/Final-Web-Version-Report-Gen-AI-in-Organization-Refresh.pdf.
2. Organisation for Economic Co-operation and Development (OECD). (2024). Recommendation of the Council on Artificial
Intelligence. https://legalinstruments.oecd.org/en/instruments/oecd-legal-0449 .
3. National Institute of Standards and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF
1.0). https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936225.
4. International Organization for Standardization (ISO). (2023). ISO/IEC 23894:2023: Information technology — Artificial
intelligence — Guidance on risk management. https://www.iso.org/standard/77304.html.
5. Claude Docs. (n.d.). Features overview. https://docs.claude.com/en/docs/build-with-claude/overview .
6. Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol.
7. Surapaneni, R., M. Jha, M. Vakoc and T. Segal. (2025). Announcing the Agent2Agent Protocol (A2A). Google for
Developers. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/.
8. Mitchell, M., S. Wu, A. Zaldivar, P . Barnes, et al. (2019). Model Cards for Model Reporting. FAT* 19:
Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220-229. https://dl.acm.org/
doi/10.1145/3287560.3287596.
9. Parikh, S. and R. Surapaneni. (2025). Powering AI commerce with the new Agent Payments Protocol (AP2). Google Cloud.
https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol.
10. Cloudflare. (n.d.). Zero Trust security | What is a Zero Trust network? https://www.cloudflare.com/en-gb/learning/security/
glossary/what-is-zero-trust/.
11. Hasan, M. M., L. Hao, E. Fallahzadeh, B. Adams, et al. (2025). Model Context Protocol (MCP) at First Glance: Studying the
Security and Maintainability of MCP Servers. https://arxiv.org/abs/2506.13538v1.
12. Lynch, B. and R. Harang. (2025). From Prompts to Pwns: Exploiting and Securing AI Agents. https://i.blackhat.com/BH-
USA-25/Presentations/US-25-Lynch-From-Prompts-to-Pwns.pdf.
13. Adapted from: International Organization for Standardization (ISO). (2023). ISO/IEC 42001:2023: Information Technology
— Artificial intelligence — Management system. https://www.iso.org/standard/81230.html; National Institute of Standards
and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://tsapps.nist.gov/
publication/get_pdf.cfm?pub_id=936225.
14. Ibid.
15. Capgemini. (n.d.). Business, meet agentic AI. https://www.capgemini.com/wp-content/uploads/2025/05/Confidence-in-
autonomous-and-agentic-systems_19May.pdf.
16. SAE International. (2021). J3016_202104 - Taxonomy and Definitions for Terms Related to Driving Automation Systems
for On-Road Motor Vehicles. https://www.sae.org/standards/j3016_202104-taxonomy-definitions-terms-related-driving-
automation-systems-road-motor-vehicles.
17. Russell, S. J. and P . Norvig. (2021). Artificial Intelligence: A Modern Approach. Pearson.
18. Hendrycks, D., C. Burns, S. Basart, A. Zou, et al. (2021). Measuring Massive Multitask Language Understanding.
https://arxiv.org/abs/2009.03300.
19. Srivastava, A., A. Rastogi, A. Rao, A. A. Shoeb, et al. (2022). Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models. Transactions on Machine Learning Research (TMLR). https://arxiv.org/abs/2206.04615.
20. Liang, P ., R. Bommasani, T. Lee, D. Tsipras, et al. (2022). Holistic Evaluation of Language Models. Transactions on
Machine Learning Research (TMLR). https://arxiv.org/abs/2211.09110.
21. Liu, X., H. Yu, H. Zhang, Y. Xu, et al. (2024). AgentBench: Evaluating LLMs as Agents. International Conference on
Learning Representations (ICLR). https://arxiv.org/abs/2308.03688.
22. Jimenez, C. E., J. Yang, A. Wettig, S. Yao, et al. (2023). SWE-bench: Can Language Models Resolve Real-World GitHub
Issues? https://arxiv.org/abs/2310.06770.
23. Rein, D., J. Becker, A. Deng, S. Nix, et al. (2025). HCAST: Human-Calibrated Autonomy Software Tasks. https://arxiv.org/
abs/2503.17354.
24. “Risk” refers to the composite measure of an event’s probability (or likelihood) of occurring and the magnitude or degree
of the consequences of the corresponding event; National Institute of Standards and Technology (NIST). (2024). Artificial
Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. https://nvlpubs.nist.gov/nistpubs/ai/NIST.
AI.600-1.pdf.
25. Adapted from: International Organization for Standardization (ISO). (2023). ISO/IEC 42001:2023: Information Technology
— Artificial intelligence — Management system. https://www.iso.org/standard/81230.html; National Institute of Standards
and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://tsapps.nist.gov/
publication/get_pdf.cfm?pub_id=936225.Endnotes
AI Agents in Action: Foundations for Evaluation and Governance
33
Ask AI what this page says about a topic: