TL;DR: AI systems widen the attack surface through prompt injection, jailbreaking, deepfakes, data poisoning, and AI-generated phishing that can bypass conventional controls, according to WitnessAI. The governance problem is not just adversarial content, but the fact that AI systems can be manipulated into taking actions, handling data, or impersonating identities outside intended guardrails.
At a glance
What this is: This is an overview of how AI cybersecurity threats expand attack surfaces through phishing, impersonation, poisoning, malware, and prompt injection, with a focus on the governance gaps they expose.
Why it matters: It matters because AI threats now affect human identity, NHI, and AI agent governance at the same time, forcing practitioners to rethink access control, observability, and incident response.
By the numbers:
- When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.
👉 Read WitnessAI's analysis of AI cybersecurity threats and AI agent abuse
Context
AI cybersecurity threats are risks created or amplified when AI systems are used to generate, shape, or execute malicious activity. In identity programmes, the problem is not only malicious content generation, but also how AI changes the speed, realism, and scale of identity abuse across people, service accounts, and agents.
The governance gap is that many controls still assume identity abuse is static, human-paced, and easy to classify. AI-driven phishing, deepfakes, prompt injection, and model manipulation compress the time available to detect misuse and blur the boundary between user action, automated action, and agent behaviour.
Key questions
Q: How should security teams govern AI systems that can take actions from untrusted input?
A: Security teams should separate untrusted input from action execution, especially where an AI system can call tools, retrieve data, or influence approvals. The core control is not content filtering alone, but limiting what the model is allowed to do after it reads external text. That reduces the chance that prompt injection becomes an access or data incident.
Q: Why do deepfakes create a bigger problem than traditional phishing for IAM teams?
A: Deepfakes weaken the signals people use to approve access changes, reset credentials, or authorise transactions. IAM teams can no longer rely on voice, image, or conversational confidence as evidence of identity. Stronger verification, step-up checks, and process separation become necessary when synthetic identities can look and sound legitimate.
Q: What do organisations get wrong about AI-generated phishing and impersonation?
A: They often treat it as a messaging problem instead of an identity problem. AI-generated phishing succeeds because it looks credible enough to pass human judgment and workflow shortcuts. Organisations should assume that speed, grammar, and tone no longer distinguish safe from unsafe requests, especially in high-trust channels.
Q: How can teams tell whether an AI model has been poisoned or influenced?
A: Look for sudden shifts in model outputs, unexplained changes in classification behaviour, or new failure patterns after data or connector updates. Poisoning is often visible first as behavioural drift, not as a clean technical alert. Teams need source provenance, change tracking, and review of retrieval inputs to detect it early.
Technical breakdown
Prompt injection and AI agent instruction hijacking
Prompt injection is a manipulation technique that causes an LLM or AI agent to follow attacker-supplied instructions instead of intended policy or task boundaries. Direct prompt injection targets the model with explicit malicious directives. Indirect prompt injection hides those directives in content the agent later processes, such as webpages, emails, or documents. Because the model parses untrusted text as potential instruction, the boundary between data and control weakens. That creates a governance problem for systems that rely on the model to decide which actions to take, especially when the agent can call tools or process sensitive context.
Practical implication: separate untrusted input handling from tool execution and constrain which actions an agent can trigger from external content.
Deepfakes, impersonation, and identity trust collapse
Deepfakes and synthetic identities exploit the fact that many identity checks still rely on human perception and context cues. AI can clone voice, image, and video well enough to make fraud and social engineering more convincing than traditional impersonation. That weakens trust in the evidence humans use to approve payments, reset credentials, or authorise access changes. In parallel, AI-generated phishing removes the rough edges that once exposed fraudulent messages. The result is a trust environment where identity signals look legitimate even when the underlying actor is not.
Practical implication: treat voice, image, and message realism as untrusted signals and require stronger verification paths for sensitive approvals.
Data poisoning and model integrity risk
Data poisoning occurs when attackers corrupt training data or influence model inputs so the AI produces unreliable or unsafe outputs. In security systems, that can distort detection logic, alter classifications, or create blind spots that defenders may not notice immediately. This is different from simple false positives or tuning errors because the attack targets the learning or reasoning substrate itself. For identity and security teams, the critical issue is that poisoned data can shape decisions made at scale, including threat scoring, content handling, and automated response logic. Once the model is degraded, downstream controls inherit the error.
Practical implication: validate training and retrieval sources, and monitor for abnormal shifts in model behaviour that indicate integrity compromise.
Threat narrative
Attacker objective: The attacker wants to convert AI-enabled trust into credential theft, unauthorized actions, or automated fraud at scale.
- Entry begins when attackers use AI-generated phishing, deepfake impersonation, or prompt injection to reach users, systems, or embedded AI workflows.
- Escalation occurs when the malicious content convinces a person or model to disclose credentials, execute unsafe instructions, or trust poisoned inputs.
- Impact follows when the attacker uses that trust break to steal data, evade detection, or turn AI systems into channels for fraud and malware delivery.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
AI cybersecurity threats are really identity trust failures at machine speed. The article groups together phishing, deepfakes, prompt injection, poisoning, and malware, but the shared pattern is that attackers are exploiting the points where organisations decide what to trust. That decision now spans humans, service identities, and AI systems that can act on what they ingest. The practitioner implication is that identity programmes need to treat AI as part of the trust boundary, not just a new attack channel.
Prompt injection exposes a runtime governance gap, not just a model security problem. The model is not merely producing bad output, it is being steered at execution time through content it was not supposed to treat as instruction. That places AI agent governance in the same analytical family as NHI access abuse, where identity is valuable because it can be induced to act. The implication is that controls must account for instruction provenance, tool boundaries, and action authority together.
Deepfakes create a synthetic identity problem that breaks human verification workflows. Traditional approval paths still assume that voice, image, and conversational confidence are meaningful indicators of identity. AI erodes that assumption by making impersonation cheaper, faster, and more convincing. The practitioner implication is that human IAM controls, fraud controls, and help desk workflows now need stronger challenge paths than perception alone.
Data poisoning is a control-plane integrity issue, not a content-quality issue. When attackers corrupt the sources that shape model behaviour, they influence decisions downstream without needing to touch each individual control. That makes the failure mode broader than a single bad prediction. The practitioner implication is to govern the data and retrieval layers with the same seriousness applied to privileged access and configuration management.
LLM-driven abuse is forcing one governance model across human, NHI, and autonomous actors. The article shows that malicious activity can now traverse a human user, a service credential, and an AI system within a single chain. That creates a cross-actor governance problem that no single identity team can solve in isolation. The practitioner implication is to align fraud, IAM, NHI, and AI governance around shared trust and accountability rules.
From our research:
- Two-thirds of enterprises have endured a successful cyberattack resulting from compromised non-human identities, with a quarter encountering multiple attacks, according to The 2024 ESG Report: Managing Non-Human Identities.
- The average organisation believes more than 1 in 5 of their non-human identities are insufficiently secured, which shows how broad the control gap already is.
- For a practical next step, review NHI Lifecycle Management Guide to align provisioning, rotation, and offboarding with the identities your AI systems depend on.
What this signals
AI exposure is becoming an identity programme issue, not just a security operations issue. As AI systems move into workflows that touch credentials, approvals, and customer interactions, teams need shared governance across IAM, fraud, and application security. The control question is no longer whether AI can be monitored, but whether the organisation can still trust the actor behind the action.
Synthetic identity pressure will keep rising unless verification paths are redesigned. Deepfake-ready attackers can now target help desks, finance teams, and administrators with believable voice and video impersonation. That makes out-of-band verification and policy-based approval boundaries more important than training alone.
With 72% of organisations already saying they have experienced or suspect a non-human identity breach, per The 2024 ESG Report: Managing Non-Human Identities, identity teams should assume AI-enabled abuse will land in existing machine identity and access workflows first. The programme response should be to tighten trust boundaries before the next wave of AI-driven abuse scales into routine operations.
For practitioners
- Classify AI-exposed workflows by trust boundary Map where AI systems consume untrusted input, make decisions, or trigger downstream actions. Prioritise workflows that can touch credentials, approvals, or sensitive records, because those are the highest-risk trust boundaries.
- Restrict model-to-tool permissions tightly Limit which tools, datasets, and actions an AI system can reach, and require separate approval for high-risk operations. Treat tool access as privileged access, not as a default extension of the model.
- Harden human verification for approval paths Move beyond voice or message realism when approving resets, payments, or access changes. Use out-of-band verification, challenge questions with low replay value, and step-up checks for sensitive requests.
- Monitor for poisoned training and retrieval sources Track changes in model behaviour, source provenance, and unexpected confidence shifts. If the model starts drifting after new data ingestion or connector changes, treat that as a potential integrity event.
Key takeaways
- AI cybersecurity threats are best understood as trust failures that let attackers manipulate people, systems, and models into unsafe action.
- The scale of the problem is growing because AI makes phishing, impersonation, and malware more convincing and more automated than legacy attacks.
- Practitioners should tighten trust boundaries, constrain model-to-tool access, and redesign verification so synthetic identity cannot pass as legitimate.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Prompt injection and AI tool abuse are core agentic AI threat patterns. |
| OWASP Non-Human Identity Top 10 | NHI-01 | AI systems and their credentials fit non-human identity governance concerns. |
| NIST AI RMF | AI risk management covers governance, measurement, and oversight for model-driven decisions. |
Establish AI governance, monitor model behaviour, and document accountability for automated decisions.
Key terms
- Prompt Injection: A technique that manipulates an AI model or agent by embedding instructions in input it was supposed to treat as data. The result can be unsafe tool use, policy bypass, or data leakage when the system cannot reliably distinguish content from control.
- Deepfake: Synthetic audio, video, or image content generated to impersonate a real person. In identity and security programmes, deepfakes matter because they can bypass human judgment and create convincing approval, fraud, or help desk scenarios that look legitimate.
- Data Poisoning: The deliberate corruption of training or retrieval data so a model behaves incorrectly or unsafely. For security teams, it is a supply-chain style integrity problem that can skew detection, classification, and automated decision-making at scale.
- AI-Driven Phishing: Phishing content generated or adapted by AI to improve targeting, language quality, and timing. It reduces the telltale signs that once made fraudulent messages easier to spot, which increases pressure on verification and identity controls.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by WitnessAI: AI cybersecurity threats, AI agent abuse, and enterprise mitigation strategies. Read the original.
Published by the NHIMG editorial team on 2025-12-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org