Subscribe to the Non-Human & AI Identity Journal

What breaks when organisations rely on voice or video to verify executives?

Voice and video verification break when the organisation treats them as proof rather than as one signal among many. Synthetic audio and video can now reproduce familiar cues convincingly enough to pass informal checks. That creates a false sense of certainty unless teams require secondary validation, especially for payments, role changes, and other high-impact requests.

Why This Matters for Security Teams

Verifying an executive by voice or video feels familiar, but familiarity is not assurance. Synthetic media now makes it possible to imitate tone, cadence, facial movement, and urgency well enough to defeat informal human checks. That matters because the request often arrives with authority bias already built in: an urgent payment, a payroll change, a vendor bank update, or a request to bypass normal approvals. The control failure is not that voice and video are useless signals, but that they are often treated as identity proof rather than one weak input among many. Current guidance from NIST SP 800-207 Zero Trust Architecture is clear that trust should be continually evaluated, not assumed from a single interaction. That aligns with NHIMG research showing how identity compromise becomes operationally expensive once teams rely on convenience checks instead of governance, especially in environments where secrets and identity sprawl are already widespread. For broader NHI context, see Ultimate Guide to NHIs. In practice, many security teams encounter voice or video fraud only after a payment exception has already been approved, rather than through intentional control testing.

How It Works in Practice

The safest operating model is to treat voice or video as a low-confidence signal and require an independent verification path for any high-impact action. That usually means a callback to a known number, approval through a separate authenticated system, or dual control with an out-of-band confirmation. For executives, the verification channel should be tied to a pre-registered identity record, not to the live media session itself. Where possible, organisations should combine process controls with cryptographic or system-based checks, because voice and video are easy to imitate but harder to bind to a managed identity lifecycle.

Practical patterns include:

  • Use voice or video only to initiate a request, never to complete an exception by itself.
  • Require a second channel for any payment, treasury, HR, or privilege change.
  • Enforce step-up approval for requests that override segregation of duties.
  • Log the request, the verifier, the channel, and the approval rationale for later review.
  • Train staff to treat urgency, secrecy, and account-change pressure as red flags.

This is especially important when executives travel, work across time zones, or regularly approve actions from mobile devices, because those conditions make informal verification feel normal. The JetBrains GitHub plugin token exposure is a reminder that identity trust often collapses through a small, believable interaction before anyone notices broader compromise. In environments where approvals are decentralized and finance teams can execute transfers quickly, these controls tend to break down because the business optimises for speed while attackers exploit the gap between human recognition and actual authentication.

Common Variations and Edge Cases

Tighter verification often increases friction, so organisations have to balance fraud resistance against executive convenience and business urgency. That tradeoff is real, and current guidance suggests it should be handled through risk-based thresholds rather than a single universal rule. A routine calendar change does not need the same treatment as a wire transfer, but both should still be mapped to approved workflows. Best practice is evolving around whether organisations should use liveness detection, signed media, or AI-generated deepfake detection tools, and there is no universal standard for this yet. Those tools can help, but they should not replace procedural controls because attackers can adapt quickly.

There are also edge cases where voice or video is a useful corroborating signal, such as first-time contact from a remote executive or a meeting where shared context matters. Even then, the safer model is corroboration, not confirmation. For organisations with high-value treasury operations, delegated authority, or frequent cross-border approvals, the verification problem becomes less about recognising the executive and more about proving the request came through an authorised path. The broader lesson is consistent with NIST Zero Trust thinking and NHIMG guidance on identity governance: trust should be explicit, contextual, and revocable, not inferred from a familiar face or familiar voice.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 Supports strong verification and anti-spoofing for autonomous, deceptive identity interactions.
CSA MAESTRO Addresses trust decisions and control validation across AI-driven and automated workflows.
NIST AI RMF Guides governance for manipulated or misleading AI-enabled communications.

Require layered, context-aware checks before any agent or user can trigger high-impact actions.