Synthetic Data 2025
Page 9 of 14 · WEF_Synthetic_Data_2025.pdf
Recommendations
Given the outlined risks and challenges, key decision-
makers must carefully consider the trade-offs when
deciding whether and how to leverage synthetic data.
Typically, in organizations, the use of synthetic data is
shaped by a collaboration between developers and
adopters (e.g. researchers and data scientists) with
regulators and policy-makers (e.g. legal advisors,
executive leadership and policy teams).
For developers and adopters:
–Prioritize model quality: Implement quality
assessment protocols to ensure that synthetic
data generation models reflect relevant dimensions
while also preserving real-world data distributions
and meeting privacy and fairness standards.
–Invest in robust traceability and provenance:
Implement robust systems to track data origins
and transformations, using metadata to identify
synthetic elements and their sources. Upfront
investment is essential, as retroactive tracing
can be costly or impossible.
–Ensure transparency: Make data generation
processes transparent to distinguish synthetic
data from organic data.
–Implement technical safeguards: Techniques
like watermarking, cryptographic provenance
or dataset “nutrition labels” build trust and
should be combined with human oversight for
high-risk applications.
–Diversify stakeholder engagement: Involve
diverse communities in governance to identify
risks, enhance legitimacy and assess for biases,
especially with marginalized groups. –Mitigate model collapse: Avoid relying solely
on synthetic data for training AI models. Use
hybrid approaches that combine synthetic
and organic data,24 and incorporate self-
correction mechanisms based on organic
data distributions.25
For regulators and policy-makers:
–Tailor governance: Not all synthetic data is
created equal. Governance frameworks must
distinguish between synthetic data intended
to replicate real-world distributions and AI-
generated data created for entertainment,
expression or model training. Regulations
should specify intended use, impact and
safeguards for each category.
–Develop context-aware standards: Support
the creation of sector-specific standards
and related benchmarks, such as those
for responsible AI, which safeguard an
organization’s long-term ability to innovate
responsibly. It would be useful to build on
efforts by privacy regulators (e.g. the European
Commission,26 Personal Data Protection
Commission Singapore27 and the Information
Commissioner’s Office in the United Kingdom28)
and international organizations (e.g. the United
Nations), especially for confidentiality-protecting
public data releases.
–Promote education and capacity-building:
Provide guidance for developers, regulators
and decision-makers on when and how to use
synthetic data responsibly, by using tools like
impact assessments, provenance checklists
and red-teaming exercises.
Synthetic Data: The New Data Frontier
9
Ask AI what this page says about a topic: