Agentic AI threat modeling exposes the real attack surface

By NHI Mgmt Group Editorial TeamPublished 2025-04-09Domain: Agentic AI & NHIsSource: CyberArk

TL;DR: CyberArk’s analysis breaks agentic AI into LLMs, tools, configuration, and memory, then shows how indirect prompt injection, credential theft, supply-chain tampering, and excessive permissions can bend autonomous workflows toward unsafe actions. The central lesson is that agentic systems must be governed as software plus identity, not as chat interfaces.

At a glance

What this is: This is a technical analysis of how agentic AI systems create new attack paths through LLM-driven control flow, tool execution, and persistent memory.

Why it matters: It matters because IAM and NHI teams must govern agent identities, credentials, and runtime decisions before autonomous systems are allowed to act.

👉 Read CyberArk's analysis of agentic AI attack surface and mitigations

Context

Agentic AI is software that uses an LLM to decide actions and then triggers tools to execute them, which makes identity, credentials, and runtime control part of the security boundary. For IAM and NHI governance, the problem is not just model output quality, but whether an autonomous system can be trusted to hold secrets, choose targets, and carry out actions safely.

CyberArk’s April 9, 2025 analysis frames that risk through a code-review workflow, but the broader issue is category-wide. Once an agent can read context, invoke tools, and persist memory, the attack surface expands beyond classic application security into NHI control, approval logic, and privilege containment. That starting position is typical for emerging agentic deployments, not exceptional.

Key questions

Q: How should security teams govern agentic AI systems that can take actions on their own?

A: Treat agentic AI as a privileged non-human identity, not as a passive application feature. Give each agent an owner, a defined purpose, and a minimal set of tools and credentials. Separate decision-making from execution, validate outputs before action, and require deterministic policy controls around commits, API calls, and secret access.

Q: Why do agentic AI systems create more security risk than standard chatbots?

A: Agentic systems can turn model output into action, which means a bad instruction can affect code flow, tool use, and downstream state. Standard chatbots usually stop at text generation. Agentic AI adds execution authority, so compromise can lead to disclosure, unauthorized changes, or misuse of credentials.

Q: What is the difference between prompt injection and excessive privilege in agentic AI?

A: Prompt injection is the path an attacker uses to influence the agent’s decision. Excessive privilege is what makes that influence dangerous. A weak prompt can be contained if the agent has little authority, but a privileged agent can turn the same manipulation into repository changes, credential exposure, or abusive tool calls.

Q: When should teams add human approval to agentic workflows?

A: Add human approval when an agent can access secrets, modify production code, touch multiple repositories, or execute commands that are hard to roll back. Human review is most useful at the point where the agent crosses from analysis into action. That is where blast radius matters most.

Technical breakdown

Agentic AI attack surface: where identity and control start to blur

An agentic system combines an LLM, tools, configuration, and often memory. The security issue is not the model alone, but the fact that model output can influence code flow and therefore decide which tool runs, what data is read, and what action is taken. That turns natural language into an input to execution logic. In NHI terms, the agent becomes an identity-bearing actor with delegated authority, which means compromise can occur through its context as much as through its code. Practical control requires separating model reasoning from execution approval.

Practical implication: Treat the agent as a privileged workload identity and require explicit authorization before tool use.

Indirect prompt injection and memory persistence

Indirect prompt injection happens when malicious instructions are embedded in data the model is supposed to process, such as repository content, comments, or other external inputs. The model may then follow the attacker’s instruction instead of the task objective. If the system has memory, the injected instruction can persist and affect later runs, creating a durable manipulation path. This is especially dangerous in systems that reuse context across workflows, because the compromise can survive beyond a single interaction. The failure mode is not just jailbreak, but long-lived trust contamination in agent context.

Practical implication: Sanitise untrusted inputs, segment memory, and validate outputs before they influence downstream action.

Excessive permissions and command execution risk in autonomous workflows

Agentic systems often need broad access to repositories, tickets, secrets, or APIs to function. That access becomes dangerous when the same agent can also be steered by manipulated inputs. The article’s examples show how prompt injection can cause a privileged agent to act on the wrong repository, repeat analysis indefinitely, or pass attacker-controlled values into shell commands. In practice, the issue is privilege amplification: a low-trust input path can reach a high-trust action path. NHI governance must therefore control not just authentication, but the full authorization chain from input to execution.

Practical implication: Apply least privilege, command allowlisting, and strict validation to every agent action path.

Threat narrative

Attacker objective: The attacker aims to turn a trusted agent into a privileged execution proxy that leaks data, bypasses review, or commits unauthorized changes.

Entry occurs when an attacker embeds indirect prompt injection inside repository content or another data source that the agent processes as part of normal work.
Escalation follows when the agent treats that injected instruction as higher priority than its task objective and begins misrouting review, disclosure, or commit decisions.
Impact occurs when the privileged agent executes unsafe actions, exposes secrets, or alters code and repositories beyond the attacker’s direct access.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Agentic AI creates an identity problem before it creates an AI problem. Once an LLM can choose tools and carry out actions, the security question becomes who or what is allowed to act on behalf of the enterprise. That is an NHI governance issue, not just a model safety issue. Practitioners should classify agents as identities with delegated authority and control them accordingly.

Indirect prompt injection is the agentic equivalent of untrusted input reaching a privileged workflow. The attack does not need to break the model directly if it can influence the data the model consumes. That means context, memory, and repository content all become security-relevant input channels. Teams should assume hostile content will reach the agent and design compensating controls around it.

Privilege concentration, not model sophistication, is the real blast-radius problem. The article shows that a single agent may hold credentials, create commits, and route work across repositories. When one manipulated input can influence multiple downstream decisions, the blast radius expands quickly. Practitioners should reduce agent privilege scope before expanding agent capability.

Operational guardrails must sit outside the LLM, not inside it. The article’s mitigations point to the right direction: output validation, least privilege, adversarial testing, and resource limits. Those controls matter because model behavior is not dependable enough to serve as the only enforcement layer. Security teams should anchor agent governance in deterministic policy and runtime checks.

Memory creates trust debt in agentic systems. Persistent context can improve usefulness, but it also allows malicious instructions to survive across interactions and workflows. That creates a durable governance burden for teams that assume each run is isolated. Practitioners should treat memory as controlled state, not harmless convenience.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a compliance and investigation blind spot, according to AI Agents: The New Attack Surface report.
For a broader control baseline, review OWASP NHI Top 10 alongside agent-specific access policies before expanding production use.

What this signals

Prompt injection becomes a governance issue the moment an agent can act on what it reads. That shifts the control objective from model correctness to execution containment. With 80% of organisations already reporting AI agents acting beyond intended scope, the practical lesson is that autonomous workflows need policy enforcement outside the model itself, plus a review path for every privileged action.

Memory is a control surface, not a convenience feature. Persistent context can preserve malicious instructions across sessions, which makes cleanup harder and incident scope broader. For teams building agentic systems, the right next step is to map where memory lives, how long it persists, and which controls govern its reuse. The Top 10 NHI Issues provides a useful way to frame that review.

The broader programme implication is that IAM teams will need separate governance for agent identity, agent memory, and agent execution paths. Existing access reviews are rarely designed to answer whether an autonomous workflow should still hold the same rights after its task, context, or target data changes. That is where OWASP Agentic Applications Top 10 becomes a useful design reference.

For practitioners

Define every agent as a governed NHI Assign owners, access boundaries, and approval paths to each agent before it is allowed to act. Map its credentials, tool permissions, and repository reach to a clear identity record so reviewers know exactly what the agent can do.
Separate untrusted input from privileged execution Do not let repository content, prompts, comments, or external documents directly determine tool execution. Insert validation and policy checks between model output and shell commands, commits, ticket updates, or secret access.
Constrain agent privileges to the smallest usable scope Use least privilege, time-bound access, and repository-specific permissions so one compromised agent cannot touch unrelated systems. Where possible, split review, analysis, and commit functions into distinct identities with different entitlements.
Test against indirect prompt injection and jailbreak paths Build adversarial tests that embed malicious instructions in comments, documentation, issue text, and other ordinary inputs. Include scenarios where the agent is asked to disclose context, repeat loops, or act on the wrong target.
Limit persistence and resource abuse Restrict memory retention, cap analysis loops, and bound the number of tool calls per task. Those limits reduce the chance that injected instructions persist or that the agent can be driven into denial-of-service behaviour.

Key takeaways

Agentic AI expands the attack surface by letting model-driven decisions trigger privileged actions, which turns identity and execution control into the same problem.
The source analysis shows how indirect prompt injection, excessive permissions, and memory persistence can each create a different path to misuse or disclosure.
Teams should anchor agent governance in least privilege, output validation, and adversarial testing before giving autonomous workflows broad production authority.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-01	Agent identity and prompt injection are central risks in this analysis.
NIST AI RMF		Agentic decision-making needs documented governance and accountability.
NIST Zero Trust (SP 800-207)	PR.AC-4	Autonomous workflows need continuous verification and constrained access paths.

Assign oversight for agent behaviour, memory retention, and action approval under an AI governance process.

Key terms

Agentic AI: Software that uses a model to decide what to do and then carries out actions through tools or APIs. In security terms, it is not just a text generator. It is an execution-capable system that can read context, select actions, and affect enterprise state.
Indirect Prompt Injection: A manipulation technique where malicious instructions are hidden inside content the model is expected to process, such as code comments, documents, or tickets. The attack works because the model treats untrusted input as task-relevant context, which can redirect decisions and downstream actions.
Agent Memory: Persistent state that allows an AI agent to retain information across executions. It can improve continuity, but it also creates a durable place for malicious instructions, sensitive data, or bad assumptions to survive, so it must be governed like controlled system state.
Privilege Amplification: A condition where a low-trust input path can influence a high-trust action path. In agentic systems, that happens when model-facing content can shape commands, commits, or credential use, allowing an attacker to get more impact than their direct access should permit.

Deepen your knowledge

Agentic AI threat modelling and control containment are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance for autonomous workflows, it is worth exploring.

This post draws on content published by CyberArk: Agents Under Attack: Threat Modeling Agentic AI. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-04-09.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org