A Blueprint for Intelligent Economies 2024
Page 12 of 21 · WEF_A_Blueprint_for_Intelligent_Economies_2024.pdf
Diverse and inclusive data
Equitable data is not a luxury; diverse and inclusive
datasets are essential for creating AI that reflects
and serves all of humanity. The World Economic
Forum’s Global Future Council on Data Equity
defines data equity as the shared responsibility for
fair data practices that respect and promote human
rights, opportunity and dignity. Data equity is a
fundamental responsibility that requires strategic,
participative, inclusive, proactive and coordinated
action. It aims to create a world where data-based
systems promote fair, just and beneficial outcomes
for all individuals, groups and communities.15
National language models are an important new
way to support data equity. A PPP between the
United Arab Emirates government and G42 has
developed one of the world’s first LLMs based
specifically on modern standard Arabic (understood
across the Middle East) and regional diverse spoken
dialects.16 Known as “Jais”, the LLM draws on local
media reports and social media posts to ensure that
locally spoken languages are included within the
LLM development while also considering cultural
norms. Taking inspiration from Google Research’s
language inclusion work and the concept of digital
language banks, Jais should act as a catalyst to
enabling region-specific model requirements.
Additionally, Cohere has developed Aya, a dataset
(more specifically, a digital language bank) that
represents one of the largest collections of multilingual
models covering 114 languages, including rare and
local dialects.17 The Aya models and datasets have
been released publicly with the intention of safely
advancing the R&D of multilingual capabilities.
Data ownership and sharing
The controlled ownership of data enables
governments to regulate how data is shared
internationally, thereby reducing misuse and
promoting trust in AI applications. This complexity of
data ownership is now increasing with the emergence
of the agent economy and multi-agent interactions,
where data is modified many times during use.
The past few years have seen a shift to data
residency restrictions, often justified as essential
to national security. These restrictions are now
shaping data centre investment as tech companies
look to comply with data residency requirements,
operational compliance and, in some cases, the
need for individual consent. Microsoft’s recent
announcement of significant investment into cloud
services in Saudi Arabia,18 for example, is partly
driven by market demand and partly by evolving
regional data residency requirements.Data protection and privacy
Emerging privacy challenges such as deepfakes,
AI-generated misinformation and high-profile
data breaches are increasing mistrust in AI. Tools
such as the World Economic Forum’s Digital Trust
Framework can support regulators and industry
leaders in considering shared goals and values in
the development, use and application of AI.19
Disclosure requirements mandate organizations
to share information about their data practices,
including how data is collected, used and protected.
Broadening these requirements to include AI-
derived data enhances data protection. It requires
companies to clarify how they use AI to process and
generate insights from personal information.20
Expanding these requirements in this way may
mean that companies need to offer their users
an opt-in/out option to consent to expanding the
purpose for which their data is used.
Data life cycle management
Regulatory tools remain key to safeguarding
the privacy and security of data. Existing
national data governance frameworks can
be adapted and employed to ensure data
is managed responsibly in the context of
AI development, deployment and use.
International agreements on cross-border data
flows are becoming increasingly vital tools to
minimizing regulatory obstacles, enhancing
collaborative research and knowledge sharing
related to AI, and building trust in data sharing.
Collaboration with stakeholders at regional and
global levels can lead to the development of shared
terminology of concepts relating to privacy and data
protection, thereby promoting clarity and effective
communication between all stakeholders. Data
intermediaries and stewards, along with leadership
from chief data officers, have an important role
to play in guiding the data strategy for collecting,
sharing and using data.
Data free flow with trust (DFFT) policies enforce
the need to govern the flow of data, both within
the data type and how it is used,21 however more
comprehensive data governance is required
to ensure that AI is developed responsibly and
ethically. This does not happen organically, and,
given the recent advancement of AI, governments
must address their wider data governance
approach to ensure that data is managed
responsibly, with safeguards to protect privacy,
security and ownership. Emerging
privacy challenges
such as deepfakes,
AI-generated
misinformation
and high-profile
data breaches
are increasing
mistrust in AI.
Blueprint for Intelligent Economies
12
Ask AI what this page says about a topic: