OWASP’s 2025 LLM risk update raises the bar on AI governance

By NHI Mgmt Group Editorial TeamPublished 2026-02-26Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: OWASP’s 2025 update to the Top 10 for LLM applications elevates prompt injection, sensitive information disclosure, excessive agency, RAG and embedding risks, misinformation, and unbounded consumption as core GenAI security concerns. The shift shows that AI governance now has to cover runtime behaviour, data leakage, and decision scope, not just model deployment.

At a glance

What this is: OWASP’s 2025 LLM Top 10 update reframes GenAI security around prompt injection, data leakage, excessive agency, and other runtime risks.

Why it matters: It matters because IAM, NHI, and AI governance teams need controls that account for model behaviour, delegated access, and sensitive-data exposure across the full AI lifecycle.

By the numbers:

96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.

👉 Read Lasso Security’s analysis of the OWASP 2025 LLM Top 10 updates

Context

OWASP’s latest LLM guidance is really about a familiar governance problem in a new runtime: systems that can be manipulated through language, data retrieval, and tool use are only as safe as the controls around their inputs, outputs, and permissions. For identity teams, the question is no longer whether GenAI belongs in the environment, but which identity and access assumptions still hold once the model starts acting on behalf of users or systems.

The 2025 changes reflect a market that has moved beyond proof of concept deployments. Prompt injection, system prompt leakage, retrieval poisoning, and excessive agency all point to the same operational gap: security teams need controls that understand how the model is authorised, what data it can touch, and how far its influence reaches across applications and workflows.

Key questions

Q: How should security teams reduce prompt injection risk in LLM applications?

A: Security teams should separate system instructions from user-controlled content, constrain what the model can treat as authoritative, and sanitise retrieved data before it reaches the context window. The goal is to prevent attacker text from shaping policy-bearing behaviour. Prompt injection is best handled as a trust-boundary problem, not only as a content-filtering problem.

Q: When does excessive agency become a governance problem for AI systems?

A: Excessive agency becomes a governance problem when the model can trigger actions, select tools, or move data without sufficient human or policy oversight. At that point, the question is no longer only whether the output is accurate, but whether the system should have been allowed to act at all. Boundaries, approvals, and scoped permissions become essential.

Q: What do teams get wrong about system prompt leakage?

A: Teams often assume hidden prompts are protected because users cannot see them directly, but any text the application exposes to the model can potentially be recovered through crafted interactions. That means prompt secrecy is not a reliable security control. Sensitive information belongs outside the model context whenever disclosure would be unacceptable.

Q: How should organisations govern RAG-based AI workflows?

A: Organisations should govern retrieval sources, indexing pipelines, and embedding stores as part of the application’s trust chain. If malicious or low-quality content can enter retrieval, it can steer model outputs and downstream decisions. The practical test is whether the model can be influenced by content it should never have trusted in the first place.

Technical breakdown

Prompt injection and indirect prompt injection

Prompt injection is an input manipulation technique that causes an LLM to ignore its intended instructions or follow attacker-authored ones. Direct prompt injection targets the current session, while indirect prompt injection hides malicious instructions in retrieved documents, web pages, emails, or other content the model ingests. The security problem is not just malformed text. It is that the model cannot reliably distinguish trusted instructions from adversarial ones once both are represented in the same context window. That makes prompt handling a control plane issue, not only an application-layer issue.

Practical implication: treat all model inputs and retrieved context as untrusted and isolate instruction sources from user-controlled content.

Sensitive information disclosure and system prompt leakage

LLMs can reveal secrets when the application exposes them in prompts, retrieval results, memory, logs, or tool responses. System prompt leakage happens when hidden instructions, policy text, or embedded credentials are exposed through model behaviour or downstream integrations. The risk grows when developers assume that a hidden prompt is equivalent to a protected boundary. It is not. If the model, connector, or retrieval layer can access the text, an attacker may be able to coax it back out through carefully shaped queries or prompt inversion techniques.

Practical implication: remove secrets from prompts and centralise sensitive data controls outside the model context whenever possible.

Excessive agency, RAG, and unbounded consumption

Excessive agency describes LLM-enabled systems that can take actions with too much independence, especially when tied to tools, retrieval, or workflow automation. RAG expands the model’s effective knowledge by pulling in external data, but that also expands the attack surface through malicious documents and embedding poisoning. Unbounded consumption adds a different risk class: uncontrolled usage, cost spikes, and resource exhaustion. Together, these entries show that AI risk is now a blend of privilege, data integrity, and operational control. The model may not be autonomous in the strict identity sense, but it can still create autonomous-like security consequences when given tool access.

Practical implication: bound tool access, retrieval sources, and usage limits before scaling AI into operational workflows.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is now an identity control problem as much as an application security problem. Once a model can be steered through untrusted input, the real failure is that the system accepted attacker-authored language inside a trusted decision path. That makes input trust, instruction separation, and retrieval hygiene governance issues, not just model-tuning issues. Practitioners should treat prompt handling as part of access control design.

System prompt leakage exposes a broken confidentiality assumption, not just a bug. The assumption that hidden prompts behave like protected policy files fails when the model can surface them through outputs, tool calls, or indirect injection paths. That failure matters because teams often treat prompt text as security by obscurity. The practical conclusion is that prompt content must never carry data that would be unacceptable if disclosed.

Excessive agency shows that LLM behaviour can outgrow the oversight model built for conventional software. The article’s expansion of LLM06 reflects a broader shift: once models can act through tools, retrieval, and downstream workflows, security no longer stops at content moderation. NHI governance, workflow permissioning, and review boundaries all come into scope. Practitioners should re-evaluate which actions are safe to delegate to model-driven systems.

RAG introduces an identity-adjacent trust boundary around knowledge, not just data. Retrieved content now influences model behaviour in ways that can alter outputs, decisions, and automated actions. That makes embedding stores, document sources, and retrieval pipelines part of the trust chain. Security teams should govern those sources as if they were privileged inputs to decision-making systems.

Named concept: prompt-context trust collapse: The article points to a specific failure mode where teams assume the model can reliably separate trusted instructions from hostile text once both share the same runtime context. That assumption fails under direct and indirect injection, and the implication is not merely better filtering but a redesign of what the model is allowed to treat as authoritative. Practitioners should stop assuming that context windows preserve trust boundaries.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 44% of organisations have implemented any policies to govern AI agents, even though 92% say that governing them is critical to enterprise security.
For a broader governance lens, OWASP Agentic AI Top 10 maps the runtime risks that emerge once models can select actions and tools.

What this signals

Prompt-context trust collapse: teams should now assume that any model pipeline combining user input, retrieval, and action execution can be steered unless the trust boundary is explicitly engineered. That shifts the work from prompt hardening alone to governance of retrieval sources, tool permissions, and approval checkpoints.

With only 52% of companies able to track and audit the data their AI agents access, governance programmes will increasingly be judged on evidentiary visibility, not policy intent. Security leads should expect auditors and incident responders to ask where the model looked, what it touched, and who authorised the action path.

The practical signal is simple: if your AI workflow can retrieve content, call tools, and influence business decisions, it already behaves like a governed identity surface even if your IAM stack does not label it that way. Link it to identity review, access scoping, and logging now, before scale makes the gap harder to close.

For practitioners

Separate instructions from untrusted content Keep system instructions, policy text, and retrieved user content in distinct handling paths so attacker-controlled material cannot masquerade as governance text. Where possible, minimise the amount of instruction content exposed to the model.
Remove secrets from prompts and retrieval outputs Do not place API keys, tokens, or confidential policy data into prompt templates, memory, or retrieval results. If the model can see it, assume an attacker may eventually coerce it back out through output shaping.
Bound model tool access and action scope Limit what model-driven workflows can do, which tools they can call, and which retrieval sources they can use. Pair those limits with explicit approval gates for actions that change data, permissions, or external systems.
Review retrieval pipelines for poisoned inputs Assess vector stores, indexed documents, and external content sources for malicious or low-trust material that could alter model behaviour. Treat retrieval as a trust boundary, not a convenience layer.

Key takeaways

OWASP’s 2025 LLM update shows that prompt injection, leakage, and excessive agency are now core governance problems, not edge cases.
The main risk is structural: model context can blur trusted instructions, sensitive data, and execution authority in a single runtime path.
Practitioners should govern retrieval, tool access, and prompt content together, because controls that treat them separately will miss the failure mode.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Prompt injection and excessive agency map directly to agentic application risks.
NIST AI RMF		AI governance is needed where LLM behaviour influences decisions and actions.
NIST Zero Trust (SP 800-207)	PR.AC-4	The article is about limiting what model-driven systems can access and do.

Establish governance, testing, and monitoring for model-driven workflows before production scale.

Key terms

Prompt Injection: Prompt injection is an attack that manipulates an LLM by inserting instructions that override or redirect the intended task. The model treats attacker-authored text as if it were part of the trusted operating context, which can change outputs, tool use, or downstream actions.
Excessive Agency: Excessive agency occurs when an LLM-enabled system can take actions that exceed the scope intended by its designers or operators. The issue is not intelligence alone, but the combination of model output, tool access, and insufficient governance over what the system can do next.
System Prompt Leakage: System prompt leakage is the exposure of hidden instructions, policies, or embedded sensitive content that were assumed to remain private inside the model context. It becomes a security problem when the model or application reveals that content through output, retrieval, or indirect interaction.
Retrieval-Augmented Generation: Retrieval-augmented generation is a pattern where a model pulls external documents or records into its context before answering. It improves relevance, but it also expands the trust boundary because poisoned or low-quality sources can shape the model’s behaviour and any actions that follow.

Deepen your knowledge

Prompt injection, sensitive information disclosure, and excessive agency are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for AI-enabled systems that blend retrieval, data access, and action, it is worth exploring.

This post draws on content published by Lasso Security: OWASP Top 10 for LLM Applications and Generative AI, key updates for 2025. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-02-26.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org