AI agent identity risk is really an IAM design problem

By NHI Mgmt Group Editorial TeamPublished 2026-03-11Domain: Agentic AI & NHIsSource: Twine Security

TL;DR: AI agents do not emote, remember, or hold a fixed mindset, which makes prompt noise, statelessness, and injection resistance central design issues for IAM and governance, according to Twine Security. The deeper problem is that review and control models built for stable, human-paced behaviour break when agent instructions are fluid and execution is context-driven.

At a glance

What this is: This blog argues that AI agents behave differently from humans in three important ways, and that those differences change how teams should design prompts, tools, and guardrails.

Why it matters: It matters because IAM, NHI, and human identity programmes all rely on assumptions about intent, state, and review that do not hold for AI agents.

👉 Read Twine Security's analysis of why AI agents do not think like humans

Context

AI agent governance starts with a plain reality check: the system is not a person, and it does not share human assumptions about confusion, memory, or intent. In identity terms, that means control design must account for how the actor behaves at runtime, not how the interface looks. This is especially relevant where AI agents are asked to classify, call tools, or act on user-provided instructions.

Twine Security frames the issue through three characteristics of large language model agents, but the broader governance question is whether existing IAM, NHI, and workflow controls were ever built for stateless, injection-prone, context-sensitive actors. The answer is usually no, which is why agent design has to become part of identity design. Practitioners should map these behaviours back to tool trust, decision boundaries, and review points rather than treating them as just prompt-engineering problems.

Key questions

Q: How should security teams govern AI agents that can change behaviour based on prompt context?

A: Treat the agent as a runtime identity, not a fixed script. The control goal is to limit how untrusted context can change tool use, output shape, and execution path. That means separating instructions from user data, constraining outputs into structured fields, and validating every tool boundary before the agent can act.

Q: Why do AI agents create more governance risk than ordinary automation?

A: Because they can choose among plausible actions inside a conversation rather than following one predetermined path. That makes their behaviour harder to certify, especially when prompt noise or injection can change what they do next. Governance has to account for decision drift, not just task completion.

Q: What do security teams get wrong about prompt engineering for AI agents?

A: They often assume better wording is enough to create reliable control. In practice, prompt style can help, but it does not create a secure boundary when the agent is still free to reinterpret context. Real governance comes from structure, validation, and constrained action paths.

Q: How can organisations tell whether an AI agent is operating outside its intended boundary?

A: Look for inconsistent classifications, premature tool calls, fabricated inputs, and responses that ignore structured guardrails. Those signals show the agent is optimising for task completion rather than respecting the workflow boundary. The safest response is to tighten the schema and review the tool path, not just rewrite the prompt.

Technical breakdown

Why stateless AI agents create control drift

Statelessness means the model does not retain durable memory between calls unless a system deliberately adds it. That makes each decision highly dependent on the prompt, tools, and surrounding context supplied at runtime. The practical issue is not merely inconsistency, but drift: the same workflow can produce different categories, different actions, or different tool selections across sessions. For identity teams, that undermines assumptions that a user or workload will behave predictably enough for fixed policy design. Once the actor’s output changes materially with each invocation, you are governing a probabilistic runtime behaviour, not a stable identity pattern.

Practical implication: Treat repeated agent calls as separate identity events and design controls around runtime consistency, not one-time approval.

How prompt injection turns instructions into a governance problem

Prompt injection works because the agent treats incoming text as context it must obey, even when that text conflicts with prior instructions. The article shows that reminders, headers, and emphatic wording are not reliable control boundaries on their own. In practice, this means the real security question is not whether the agent can be told what to do, but whether untrusted input can reshape its decision path. That is an identity governance issue because the actor’s effective authority changes when user content can steer tool use, classification, or downstream execution.

Practical implication: Separate trusted control signals from untrusted user input, and do not assume prompt structure alone creates a policy boundary.

Why tool design can nudge agent behaviour in unintended ways

The post’s SQL example shows that agents will often optimise for the simplest path to completion, even when that path violates the designer’s intent. A tool name, argument label, or input shape can act as a behavioural nudge that pushes the agent toward fabricated or premature outputs. That is important because tool schemas are not neutral plumbing. They are part of the control plane that determines what the agent thinks is possible and what it thinks it should do next. For governance teams, this makes tool affordances a first-class security concern.

Practical implication: Review tool names, argument structures, and return values as security controls, not just developer convenience choices.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI agent governance fails when teams assume instructions are stable enough to certify. The article shows that agents do not complain when instructions are contradictory, noisy, or incomplete, which means human review models that depend on stable behaviour lose fidelity. That is not just a usability issue, it is a governance premise problem. Practitioners should recognise that certification logic built for predictable actors becomes unreliable when the actor can reinterpret context on every turn.

Prompt injection is not only an application flaw, it is an identity boundary failure. When untrusted text can alter the agent’s action path, the system’s effective authority is no longer anchored in the original operator intent. That matters because identity controls assume a separation between trusted instruction and untrusted input. The implication is that AI agent governance must treat context ingestion as part of the authorisation surface, not as a harmless prelude to execution.

Tool schemas create identity behaviour, not just technical interface behaviour. The article’s argument about nudges and sludges shows that agents respond to the shape of the tool, not only to the content of the instruction. This aligns with OWASP-AGENTIC and OWASP-NHI thinking, where the control problem is the path an identity is allowed to take through tools and data. Practitioners should expect tool design to influence privilege use as strongly as policy text does.

Consistently stochastic behaviour is the named governance gap here. The same agent can produce reasonable answers without producing consistent categories, which means traditional access and workflow controls may observe success while governance quality quietly degrades. That is why a workflow can appear healthy at the task level and still fail at the policy level. The practitioner conclusion is straightforward: consistency must be designed, not assumed, when the actor is an AI agent.

Identity programmes need a separate control model for agents that are neither human nor static NHI. The article sits in the space where NHI governance, workflow automation, and AI behaviour collide. That makes it a strong fit for OWASP-AGENTIC, NIST-AIRMF, and OWASP-NHI framing together. The field should stop treating agent identity as a narrow tooling issue and start treating it as a cross-domain governance discipline.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That is why the OWASP Agentic Applications Top 10 matters as a forward control reference for agent behaviour, tool misuse, and prompt-driven scope drift.

What this signals

Scope drift is becoming the defining governance signal for AI agents. When 80% of organisations have already seen agents act beyond intended scope, the issue is no longer theoretical. Security teams should expect prompt handling, tool permissions, and runtime validation to be reviewed together, because fixing only one layer leaves the others exposed.

As agent use expands, practitioners should measure whether a workflow is still predictable enough to certify. A healthy programme will be able to explain why an agent chose a given tool, why it stayed inside policy, and what stopped it from inventing unsupported inputs. If those answers are unclear, the control model is too loose.

Teams that already use NIST AI Risk Management Framework language can map this problem to governance, measurement, and monitoring rather than treating it as a narrow engineering concern. The practical next step is to make agent behaviour auditable at the action level, not just at the prompt level.

For practitioners

Separate trusted instructions from untrusted input Create explicit boundaries between system-level intent, user-provided content, and tool-return data so the agent cannot treat all text as equal authority.
Constrain free-text outputs into structured decisions Replace open-ended classifications with enums, booleans, and bounded fields so deterministic logic can assemble the final result after the agent responds.
Review tool names and argument labels for behavioural nudges Check whether a tool name or parameter encourages the agent to fabricate inputs, skip a step, or jump ahead of required context.
Add remedial instructions at the point of failure Return corrective guidance immediately after an unsafe or incomplete action so the agent receives feedback in the same context window where the error occurred.

Key takeaways

AI agents create a governance problem because they do not behave like stable human actors or fixed automation.
Prompt injection, inconsistent outputs, and tool nudges are control issues, not just model quirks.
Practitioners need structured outputs, tighter tool boundaries, and auditability at the action level.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection and tool misuse are central to the article's risk pattern.
NIST AI RMF		The article is about governing unpredictable AI behaviour across the lifecycle.
OWASP Non-Human Identity Top 10	NHI-03	Agent credentials and tool access behave like non-human identity entitlements.

Constrain prompts, tools, and outputs so untrusted input cannot redirect agent actions.

Key terms

AI Agent Runtime Identity: The identity an AI agent effectively uses while it is making decisions, calling tools, and producing outputs. It is not just the account behind the system. For governance, the runtime identity includes the permissions, context, and guardrails that shape what the agent can do in the moment.
Prompt Injection: A control-bypass pattern where untrusted text changes an AI agent’s behaviour by altering the instructions it follows. In governance terms, the threat is not only malicious content, but the fact that the agent may treat that content as operationally authoritative if boundaries are weak.
Consistently Stochastic Behaviour: A repeated but non-deterministic pattern where an AI agent gives similar inputs different outputs over time. This matters for identity and access governance because the system may look reliable in one transaction while still failing to produce stable, certifiable behaviour across sessions.
Tool Schema Nudge: The influence a tool name, parameter label, or input shape has on an AI agent’s choice of action. A poorly designed schema can encourage fabrication, shortcutting, or premature execution, which makes the tool definition itself part of the security boundary.

Deepen your knowledge

AI agent governance and runtime identity boundaries are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are designing controls for agents that can change behaviour at runtime, it is worth exploring.

This post draws on content published by Twine Security: Your AI Agent Doesn’t Think Like You (It’s a Feature, Not a Bug). Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-11.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org