AI agents and psychological warfare expose an NHI trust gap

By NHI Mgmt Group Editorial TeamPublished 2026-04-20Domain: Agentic AI & NHIsSource: Unosecur

TL;DR: AI agents can be manipulated through prompt injection and other trust exploits because they execute instructions across cloud, pipeline, and data environments without judgment, according to Unosecur. That means NHI governance must focus on continuous verification, not assumed intent or one-time access approval.

At a glance

What this is: This analysis argues that AI agents create a trust gap in NHI governance because they act with access and execution authority but cannot judge whether instructions are malicious.

Why it matters: IAM and NHI teams need to treat agent trust as a live control problem, because prompt injection and autonomous execution can turn ordinary access into operational misuse.

👉 Read Unosecur's analysis of AI agents and psychological warfare

Context

AI agents are non-human identities that can authenticate, access systems, and take actions across cloud infrastructure, production pipelines, and sensitive data environments. The governance gap is that they operate inside trust boundaries designed for people, even though they do not understand intent or recognise manipulation. For IAM and NHI practitioners, that mismatch creates a control problem rather than a simple software risk.

The article frames prompt injection and agent-to-agent manipulation as practical threats, not theoretical edge cases. That makes the issue relevant to NHI governance because the core question is no longer whether an agent has credentials, but whether the organisation can verify what it was instructed to do, what it touched, and whether its behaviour stayed inside policy. In environments that rely on opaque execution paths, traceability becomes part of access control.

AI agent adoption is starting from a position that is common across the market: rapid deployment, broad access, and limited scrutiny. That combination is exactly where identity programs tend to fail first, because standing trust expands faster than governance can keep up. The result is not just more access, but more opportunity for misuse inside trusted workflows.

Key questions

Q: How should security teams govern AI agents that can act across multiple systems

A: Treat the agent as a non-human identity with scoped permissions, runtime monitoring, and clear stop conditions. Governance should not end at authentication. Teams need task-specific access, input validation, and audit trails that show what the agent read, decided, and changed across every system it touched.

Q: Why do AI agents create more identity risk than ordinary automation

A: AI agents create more risk because they do not just execute predefined scripts. They interpret instructions, call tools dynamically, and can be redirected by malicious context after authentication. That means the failure mode is not only credential abuse, but also trusted execution being steered into unintended actions.

Q: What is the difference between prompt injection and credential theft for agents

A: Prompt injection manipulates the agent’s decision path, while credential theft steals the access tokens or secrets it uses. Both are serious, but prompt injection is harder to spot because the agent may still be using valid credentials. Governance must address both the identity layer and the instruction layer.

Q: When should organisations require human approval for AI agent actions

A: Use human approval when the agent is about to change production systems, move sensitive data, or take an action that would be high-risk if a person did it unsupervised. Approval gates work best as exception controls, not as the default for every task, so the policy should focus on impact and blast radius.

Technical breakdown

How prompt injection exploits agent trust boundaries

Prompt injection works when malicious instructions are embedded in content the agent is allowed to read, such as documents, web pages, or tool outputs. The agent treats that input as actionable context, not as an adversarial payload. In practice, the attack does not need to break authentication. It hijacks the decision path after authentication has already succeeded. That is why traditional perimeter thinking misses it. The risk is architectural: an agent can be fully authorised and still be misdirected by untrusted instructions that arrive through normal workflow channels.

Practical implication: Treat every external or user-controlled input as potentially adversarial and validate it before the agent can act on it.

Why autonomous execution magnifies non-human identity risk

An AI agent is not just a requester of resources. It is an actor that can modify code, move data, trigger workflows, and chain tool calls. Once an agent can make decisions at machine speed, the blast radius of a single compromised instruction grows quickly. This is why NHI governance for agents cannot stop at identity issuance or static role assignment. The control surface includes execution scope, tool permissions, data reach, and the ability to detect deviation while the action is still in progress. Without that, autonomy becomes an amplification layer for mistakes and manipulation.

Practical implication: Scope agent privileges narrowly and attach runtime controls that can stop or contain out-of-policy actions as they happen.

Why traceability is becoming part of access governance

The article points to a key shift: organisations need to track what an agent did and what artefacts it produced, not just whether it authenticated. That is a governance change, because access control alone does not explain behaviour after the fact. Auditability must include step-level action traces, produced outputs, and the context that led to a decision. For NHI programs, this is where monitoring, logging, and policy enforcement converge. If an agent can act across systems without a durable trail, then accountability is effectively incomplete.

Practical implication: Build end-to-end activity logging for agent actions and tie it to access policy so investigations can reconstruct both cause and effect.

Threat narrative

Attacker objective: The attacker wants to turn a trusted AI agent into an execution path that bypasses human judgment and carries out unintended actions inside enterprise systems.

Entry occurs when an attacker plants malicious instructions in a document, webpage, or tool response that an agent is allowed to consume.
Escalation happens when the agent follows the injected instruction chain and uses its own authenticated access to reach systems or data outside the operator’s intent.
Impact emerges when the manipulated agent modifies code, shares sensitive data, or triggers downstream workflows at scale before the abuse is detected.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI agents should be governed as non-human identities with active trust exposure, not as passive automation. Once an agent can authenticate and act, it inherits the same governance burden as any other privileged identity, but with more unpredictable behaviour. Traditional IAM assumes a human user or a bounded workload. Agentic systems break that assumption because they can be manipulated after authentication. Practitioners should therefore model agent trust as a live control problem, not a setup task.

Prompt injection creates a new class of identity risk: ephemeral instruction compromise. The agent may never lose its credentials, yet its decision path can still be subverted. That is different from classic credential theft, because the breach occurs through trusted context rather than stolen secrets. This makes policy enforcement, input validation, and execution scoping part of the identity stack. Teams should treat malicious instructions as an access-layer threat, not merely an application-layer bug.

Identity blast radius matters more when agents can chain actions across tools. An over-permissioned human account is dangerous, but an over-permissioned agent can expand that danger at machine speed and with little visibility. That is why least privilege, runtime approval, and action tracing need to be evaluated together. The practical conclusion is simple: reduce what the agent can reach before you rely on it to do more.

Traceability is becoming a control objective, not a reporting feature. If organisations cannot reconstruct what an agent read, decided, and produced, then governance stops at the first hop. That is not enough when the system itself can amplify a malicious instruction into multiple downstream events. Security teams should treat audit trails for agents as an operational requirement and a compliance prerequisite.

AI agent governance will converge with broader NHI policy faster than many IAM programs expect. The same controls that matter for service accounts, secrets, and workload identities now apply to agents, but with tighter runtime scrutiny. This will force security leaders to close the gap between identity policy and execution policy. Teams that align those layers now will be better positioned to govern autonomous systems safely.

From our research:
Only 44% of organisations have implemented any policies to manage their AI agents, despite 92% agreeing that governing AI agents is critical to enterprise security, according to AI Agents: The New Attack Surface.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a broader control model, see OWASP Agentic AI Top 10 for the threat patterns teams should map to policy and runtime controls.

What this signals

Identity programmes should expect agent governance to become a mainstream control expectation, not a niche AI project. With 69% of security leaders saying identity management must fundamentally shift to address agentic AI systems, the programme risk is structural. Teams that keep agent policy separate from identity policy will struggle to govern autonomous behaviour consistently.

Ephemeral credential trust debt: agents can inherit access quickly, but organisations often defer the controls needed to verify how that access is used. That creates a governance gap that widens every time an agent is allowed into production workflows without a matching audit and approval model. The practical signal is that identity design now has to include execution oversight, not just issuance.

As agentic systems mature, the relevant framework set will expand from IAM and PAM into NIST AI Risk Management Framework style governance, because accountability, monitoring, and human oversight all become part of access control. Security leaders should expect policy language to shift from who can log in to what autonomous actors are allowed to do once they do.

For practitioners

Implement runtime input validation for agent prompts Block or sanitise untrusted content before it can influence tool use, code changes, or data access. Focus on documents, web content, ticket text, and tool outputs because those are common injection paths in agent workflows.
Scope agent privileges to task-specific minimums Map each agent to a narrow set of resources, APIs, and workflows, then revoke anything outside the immediate use case. Combine least privilege with time-bound access so the agent cannot retain standing reach after the task ends.
Add step-level audit logging for agent actions Record prompts, tool calls, decisions, outputs, and exceptions so investigators can rebuild the chain of events. Use logs to compare intended versus actual behaviour and to support compliance reviews when autonomous actions affect production.
Pair policy enforcement with stop conditions Define hard limits for data movement, code changes, and workflow triggers, then configure human review when the agent crosses them. A useful control is an approval gate when the agent attempts actions outside its expected scope.

Key takeaways

AI agents create an NHI governance problem because they can be authorised correctly and still be manipulated through trusted inputs.
The scale of the gap is already visible: most organisations say governance matters, but fewer than half have policies for agents and even fewer can audit their data access.
Security teams should move from static access thinking to runtime verification, task scoping, and step-level traceability for autonomous identities.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-01	Prompt injection and agent misuse map to agentic AI identity and tool-abuse risks.
NIST AI RMF		Agent trust, oversight, and accountability align with AI risk governance requirements.
NIST CSF 2.0	PR.AC-4	Least privilege and access reviews are central to governing agent identities.

Validate untrusted inputs and restrict tool access for autonomous agents before they reach production.

Key terms

Prompt Injection: Prompt injection is an attack where malicious instructions are embedded in content an AI agent is likely to trust, such as documents, webpages, or tool responses. The goal is to steer the agent’s behaviour without stealing credentials. It turns normal context into an adversarial control channel.
Non-Human Identity: A non-human identity is any machine or software identity that authenticates and acts inside an environment, including service accounts, API keys, tokens, certificates, and AI agents. These identities need lifecycle control, privilege scoping, and auditability because they can reach systems without human judgment.
Identity Blast Radius: Identity blast radius is the amount of damage an identity can cause if it is misused, compromised, or manipulated. For AI agents, the blast radius grows when tool access, data reach, and workflow permissions are broad or poorly supervised. Reducing it is a core governance objective.
Runtime Verification: Runtime verification is the practice of checking what an identity is doing while it is active, rather than relying only on provisioning-time controls. For autonomous agents, it means monitoring prompts, tool use, outputs, and policy violations as actions unfold so harmful behavior can be contained early.

What's in the full article

Unosecur's full blog covers the operational detail this post intentionally leaves for the source:

Examples of behavioural signals the vendor says can help detect manipulated agent activity before it escalates.
Discussion of how to audit artefacts produced by AI agents across multiple channels and formats.
The vendor's view on continuous evaluation of agent trustworthiness inside production workflows.
Practical framing for tracking deviation from expected agent patterns across environments.

👉 Unosecur's full post expands on behavioural monitoring, artefact auditing, and trust verification for agents.

Deepen your knowledge

AI agent governance and non-human identity controls are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building policies for autonomous systems or overhauling identity oversight, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org