Subscribe to the Non-Human & AI Identity Journal

Model-Mediated Phishing

Model-mediated phishing is a social engineering pattern where an attacker uses an AI assistant to deliver the lure instead of sending the lure directly. The assistant becomes the voice, formatting layer, or authority cue, which can make malicious instructions seem more trustworthy than the original email or message.

Expanded Definition

Model-mediated phishing is a social engineering pattern in which the attacker does not send the lure directly. Instead, the attacker supplies deceptive prompts, instructions, or content to an AI assistant, which then becomes the delivery vehicle for the message. The result can look polished, authoritative, and context aware, even when the underlying intent is malicious.

In NHI and IAM environments, this matters because the assistant may sit inside email, chat, ticketing, or workflow systems that already carry trust. The pattern is not limited to one product or one model. Definitions vary across vendors because some describe it as prompt injection, while others frame it as AI-assisted fraud or AI-mediated impersonation. NHI Management Group treats the term as a delivery pattern, not a separate attack family.

According to the NIST Cybersecurity Framework 2.0, organisations should treat identity, access, and communications trust as explicit security functions rather than implicit assumptions. The most common misapplication is assuming the AI assistant is neutral, which occurs when users trust generated instructions because the message appears formatted and conversational.

Examples and Use Cases

Implementing controls against model-mediated phishing rigorously often introduces friction in user experience, requiring organisations to weigh faster communication against stronger verification.

  • An attacker seeds a help desk chat with a fake account recovery story, and the assistant rewrites it into a more convincing instruction set for a target employee.
  • A compromised workflow agent drafts an internal request that appears to come from a trusted manager, then pushes the victim toward credential reset or payment diversion.
  • A phishing email is routed through an AI summariser, which reframes the original message into a concise and seemingly legitimate directive.
  • In a customer support setting, the attacker uses model output to mimic the tone and structure of a vendor notice, increasing the chance that secrets are disclosed.
  • The pattern is visible in real-world incidents such as the New York Times breach, where identity trust and message framing become part of the attack surface.

Practitioners often compare this risk to AI prompt abuse and email spoofing, but the difference is that the assistant itself becomes the authority cue. For deeper context on identity-driven security, the Ultimate Guide to Non-Human Identities explains why mismanaged digital actors and trust chains create broad exposure.

Why It Matters in NHI Security

Model-mediated phishing matters because it exploits the boundary between human judgement and machine-generated authority. When the assistant is allowed to compose, summarise, or reframe requests, it can normalise suspicious content and reduce the resistance that would usually stop a direct phishing attempt. That is especially dangerous where the assistant has access to inboxes, tickets, documents, or delegated actions tied to non-human identities.

This risk compounds existing NHI weaknesses. NHI Management Group reports that 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, and 97% of NHIs carry excessive privileges. When those identities are reachable through assistant-driven workflows, the phishing path can become a privilege escalation path as well.

Security teams should pair message provenance, authorization checks, and agent guardrails with visibility into secret exposure and delegated access. The Ultimate Guide to Non-Human Identities also notes that only 5.7% of organisations have full visibility into their service accounts, which makes assistant-mediated abuse harder to spot early. Organisations typically encounter the consequence only after a misdirected approval, credential disclosure, or fraudulent action has already occurred, at which point model-mediated phishing becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 LLM-04 Model output can be abused to impersonate authority and deliver deceptive instructions.
OWASP Non-Human Identity Top 10 NHI-06 Phishing paths often aim at NHI secrets, delegated access, and overprivileged service accounts.
NIST CSF 2.0 PR.AC-1 Identity trust and access control are central when AI systems mediate security-sensitive communication.

Protect NHI secrets and enforce least privilege so a deceptive message cannot trigger broad compromise.