Unmasking Cybercrime Strengthening Digital Identity Verification against Deepfakes 2026
Page 10 of 23 · WEF_Unmasking_Cybercrime_Strengthening_Digital_Identity_Verification_against_Deepfakes_2026.pdf
Operational modes
and real-time capability
Tools were found to operate in three principal modes:
• Real-time webcam swappers – capable of low-latency
manipulation and virtual camera output; out of five tools
demonstrating genuine real-time face swapping, three
enabling virtual-camera or equivalent injection paths that could
feed a verification flow with minimal intermediary processing
• Offline desktop frameworks – designed for post-
production workflows and high-quality output, but
without instantaneous rendering
• Hosted web services – browser-accessible services
that performed server-side processing (some offering
“real-time preview” but requiring final cloud processing,
limiting continuous live use)
Real-time capability was concentrated in a small number
of technically mature platforms, according to research and
open-source information collected on these tools rather
than through direct testing. Where real-time output was
unavailable, hosted or offline outputs could still be chained
with injection techniques to simulate live inputs, albeit with
additional detectable signals.
Expression and motion fidelity
Tracking of dynamic facial features – blink rate, lip motion,
head pose and micro-expressions – was identified as a
critical determinant of whether outputs could pass liveness
checks. High-quality dynamic expression preservation was
uncommon; six out of 17 tools employed advanced motion
models or architectural approaches that preserved nuanced
expressions, producing more natural output for dynamic video.
The majority of tools provided partial or static expression
reproduction, which increased susceptibility to detection
under challenge–response or randomized motion prompts.
Lighting adaptation and blending quality
Advanced, environment-responsive lighting adaptation
was rarely observed. Only two tools incorporated enhanced
blending techniques or optimizations targeted at poor
lighting conditions. These two were effective primarily on
static or pre-recorded content and required manual tuning
for best results. Common failure modes included colour tone
mismatch, shadow inconsistency and visible edge seams
that were amplified by compression and abrupt changes
in illumination. When present, lighting adaptation typically
focused on skin tone matching and basic face blending rather
than real-time adaptive relighting.Latency, timing accuracy and
robustness to challenge–response
Latency and timing accuracy were key limiting factors for
passing real-time liveness checks. Timing slippage during
challenge–response interactions was consistently observed
as a failure point. Tools that produced low-latency output
with accurate synchronization between facial motion and
audio or prompts were rare. Many tools advertised “real-time
preview” capabilities, yet required additional cloud processing
for final output, resulting in latency unsuitable for continuous
live impersonation.
Audio integration and voice cloning
Native voice cloning functionality was not provided by any
of the evaluated tools. Overall, three tools offered basic audio
features such as pre-generated or AI-synthesized speech
insertion or lip-sync to supplied audio tracks. These features
did not equate to full voice replication from sample audio.
Therefore, creation of convincing audio-visual deepfakes
would generally require integration with an external,
dedicated voice synthesis platform.
Platform support, device compatibility
and resource demands
Platform support was distributed as follows: three tools
explicitly supported PC (desktop) environments, four
were compatible with Android, and two supported iOS.
Browser-based implementations were common (10 tools),
reflecting a trend towards accessibility and ease of use.
Cross-platform support was moderate, with most tools
supporting approximately two platforms on average.
Overall, nine tools required local installation and execution
on user devices, while eight operated entirely online with
server-side processing. Desktop real-time tools were
observed to require capable graphics processing units
(GPUs) for stable, low-latency operation; hosted services
reduced local hardware requirements at the cost of
network dependency.
Network behaviour
Hosted solutions were noted to be network-dependent;
observed behaviours included sustained upstream
video upload and frequent small data exchanges
for incremental processing. Tools that operated
locally produced minimal network traffic, but injection
techniques that forwarded altered frames into remote
verification flows introduced identifiable network patterns
(for example, consistent outbound streams followed by
virtual camera input).
Unmasking Cybercrime
10
Ask AI what this page says about a topic: