Subscribe to the Non-Human & AI Identity Journal

How should security teams handle human verification when voice and video can be faked?

Treat voice and video as untrusted indicators, not proof. For high-risk human verification, use cryptographic confirmation tied to a device-bound private key and a single-use session artifact. Keep the human conversation for context, but move the trust decision to a deterministic signature check that synthetic media cannot reproduce.

Why This Matters for Security Teams

Voice and video can now be synthesized convincingly enough to defeat informal checks, which means “I heard the person” or “I saw the person” is no longer a reliable trust signal. Security teams that still rely on visual confirmation are effectively treating a mutable presentation layer as proof of identity. Current guidance from the NIST Cybersecurity Framework 2.0 favours verifiable, repeatable controls over subjective assurance, and that applies directly here.

The practical risk is not just impersonation. A fake executive voice can be used to approve a reset, authorise a transfer, or pressure an analyst into bypassing process. That is why NHI Management Group treats media as context, not evidence. Security teams should move the trust decision to cryptographic verification tied to a device-bound private key and a single-use session artifact, then keep the conversation layer for human context. The same pattern of over-trusting exposed identity surfaces appears in NHI incidents such as JetBrains GitHub plugin token exposure, where convenient trust shortcuts become a breach path. In practice, many security teams encounter voice or video fraud only after a high-risk approval has already been misused, rather than through intentional verification design.

How It Works in Practice

The workflow should separate interaction from assurance. First, the human and the requester can communicate over voice or video for context, but the platform must not treat that channel as authoritative. The trust decision should happen in a second step: the requester proves possession of a device-bound private key, signs a single-use challenge, and presents a short-lived session artifact that is valid only for that interaction.

That approach aligns with the broader NHI and Zero Trust pattern in NHI Management Group’s Ultimate Guide to NHIs, where the point is not “who sounded right” but “what cryptographic identity was actually present.” In mature implementations, the challenge should be bound to the session, the action, and the risk tier. For example:

  • High-risk actions require a fresh signature, not a reused login token.
  • The private key should live in a hardware-backed or device-bound store whenever possible.
  • The session artifact should expire quickly and be unusable outside the original workflow.
  • Verification logs should record the challenge, signature outcome, and policy decision, not raw biometric media.

Where available, teams can layer in policy-as-code so that approval thresholds, device posture, and action sensitivity are evaluated at request time rather than hard-coded into a one-time manual process. This is consistent with NIST Cybersecurity Framework 2.0 principles for governed, repeatable control enforcement. These controls tend to break down when legacy help desks or ad hoc executive exception paths still permit identity resets based on a single live call.

Common Variations and Edge Cases

Tighter verification often increases friction, so organisations have to balance fraud resistance against operational speed. That tradeoff is real, especially for customer support, incident response, and executive communications where delays can create business pressure. Best practice is evolving, and there is no universal standard for this yet, but current guidance favours stronger proof for higher-risk actions and lighter checks for low-risk interaction.

One edge case is emergency access. If a leader needs urgent approval during an incident, the process should still require deterministic proof, but the artifact can be shortened and the policy can permit rapid escalation under a pre-approved emergency path. Another edge case is accessibility: not every user can reliably complete video-based checks, which is another reason media should never be the trust anchor.

Teams should also assume that attackers will chain social engineering with synthetic media. That means the control must verify the session, not the persona. If the organisation already uses phishing-resistant authentication, the same device-bound identity can often be extended to human verification flows instead of inventing a separate trust model. In environments with remote work, contact-center outsourcing, or multiple regional approval chains, these controls become harder to operationalise because identity ownership, device trust, and approval authority are rarely unified.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A1 Synthetic media abuse fits agentic impersonation and trust-boundary failures.
CSA MAESTRO TR-1 MAESTRO addresses trustworthy runtime verification for autonomous and assisted workflows.
NIST AI RMF AI RMF covers governance of synthetic-media risk and verification controls.

Treat human-facing voice/video as untrusted input and require cryptographic proof for high-risk actions.