Context engineering exposes the real attack surface in agentic AI

By NHI Mgmt Group Editorial TeamPublished 2025-09-10Domain: Agentic AI & NHIsSource: Pillar Security

TL;DR: Context engineering failures can let attackers poison prompts, retrieval pipelines, and business logic so AI systems leak data or take unsafe actions, even when the model itself is not compromised, according to Pillar Security. The security boundary has shifted upward into context, where access control, sanitation, and runtime policy now determine trust.

At a glance

What this is: This article argues that context engineering, not the model alone, is the decisive attack surface for agentic AI systems.

Why it matters: It matters because IAM, NHI, and AI governance teams must control who and what can feed trusted context into agents before context poisoning turns safe automation into unsafe execution.

By the numbers:

Today, these systems have failure rates ranging from 60% to 90%, often because they lack the situational data needed to ground their outputs.
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes , and as quickly as 9 minutes in some cases.

👉 Read Pillar Security's analysis of context engineering attacks on agentic AI

Context

Context engineering is the discipline of supplying an AI system with trusted situational data so it can make correct decisions. In agentic AI, the problem is not only model quality. It is the trust boundary around prompts, retrieval sources, orchestration tools, and business logic that determine what the system believes and does.

That boundary is now an identity and governance problem as much as a security engineering problem. If an attacker can shape the context an agent reads, they can influence actions without touching the model or core infrastructure, which means access control around retrieval and data flow becomes part of AI governance.

This is the same failure pattern that shows up in NHI and agentic AI programmes when systems trust external inputs too broadly. The difference is that the context itself becomes the control plane, so the risk moves from code compromise to decision compromise.

Key questions

Q: How should security teams control context in agentic AI systems?

A: Security teams should treat context as a governed input, not an implementation detail. That means classifying sources, restricting who can write or retrieve from them, filtering sensitive content, and logging every context change that affects agent behavior. If an agent can only act safely when its inputs are trusted, context provenance becomes a core control.

Q: Why do context poisoning attacks matter if the model itself is secure?

A: They matter because the model is often not the target. Attackers can manipulate the data an agent trusts and still force unsafe decisions, data leakage, or policy bypass. In other words, the failure sits in the trust boundary around retrieval and prompts, where context becomes a covert control plane.

Q: What do teams get wrong about securing AI agents?

A: Teams often secure the application code while leaving prompts, retrieval sources, and orchestration paths loosely governed. That creates a false sense of safety because the agent can still consume malicious or stale context and act within its permissions. The real question is whether the agent’s decision inputs are controlled as tightly as its code.

Q: How do organisations know whether their AI context controls are working?

A: They should be able to trace every agent decision back to a known context source, see which inputs were trusted at runtime, and detect when unapproved content was excluded or downgraded. If the team cannot reconstruct the context path during review or incident response, the controls are not yet effective.

Technical breakdown

Context poisoning in retrieval pipelines

Context poisoning happens when malicious or misleading data is inserted into the sources an AI system retrieves and trusts. The model does not need to be broken for this to work. If the retrieval layer surfaces poisoned content, the agent may treat it as authoritative context and make decisions that follow the attacker’s framing. This is especially dangerous in RAG-style systems, where external documents, tickets, or manuals are treated as runtime truth. The control failure is not just bad data quality. It is the absence of trust differentiation between approved and unapproved context sources.

Practical implication: classify retrieval sources and restrict which context can influence agent decisions.

Business logic manipulation through trusted inputs

Business logic manipulation is a narrower form of context poisoning in which the attacker crafts instructions that look legitimate inside the system’s own language. The content may appear to be a valid policy note, support instruction, or operational prompt, but it is designed to push the agent toward unsafe or unauthorized behavior. This works because the agent follows contextual cues rather than fixed code paths. In practice, the attack targets the interpretation layer between data and action, where policy, prompt, and retrieval all interact. The result is a governance bypass, not a software exploit.

Practical implication: test how agents respond to adversarial but plausible instructions in real context sources.

Shift Up for AI guardrails

Traditional shift-left security focuses on code, build, and test. That is necessary but incomplete for AI systems because the operational risk sits in the abstraction layer above code: prompts, retrieval pipelines, orchestration, and runtime policy. A Shift Up model treats context as first-class security scope and applies controls to what the model can see and use at decision time. For identity teams, this means runtime authorization and context governance must be aligned. If the wrong subject can feed the right context, the agent can act inside its permissions and still do the wrong thing.

Practical implication: extend governance to prompts, retrieval, and orchestration, not just application code.

Threat narrative

Attacker objective: The attacker wants to steer AI behavior through trusted context so the system leaks sensitive information, bypasses safeguards, or executes unsafe actions.

Entry occurs when an attacker places malicious instructions or poisoned documents into a context source the AI system trusts.
Credential or authority abuse follows when the agent treats that context as legitimate and applies it to decision-making or output generation.
Impact occurs when the manipulated context causes data leakage, unsafe actions, or policy-bypassing behavior without any direct model compromise.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Context engineering is now an identity governance problem, not just an AI quality problem. The article shows that the decisive control point is who can shape the data an agent trusts at runtime. That moves the boundary of governance from model-centric review to source, retrieval, and orchestration control. For IAM and NHI teams, the practical implication is that trust in context must be managed like trust in credentials.

Context poisoning is the AI equivalent of identity input tampering. The attacker does not need to break the model if they can alter the agent’s view of reality. That makes business logic manipulation a control-plane failure, because the agent acts on trusted but false context. The practitioner conclusion is that source trust, provenance, and runtime filtering now belong in the same conversation as access control.

Shift Up captures the real security change: the sensitive layer has moved above the codebase. Traditional shift-left methods still matter, but they do not address the retrieval and policy layers that decide what the model sees. This is where agentic AI governance starts to resemble NHI governance, because the system can be technically permitted while still being contextually unsafe. Practitioners should treat context pipelines as governed assets, not plumbing.

OWASP NHI Top 10 remains relevant because agentic systems inherit NHI-style trust failures at runtime. The article’s core issue is not model intelligence, but trust placement. When agents consume external context, they create a new form of privilege: the privilege to shape decisions through data influence. The implication is that identity controls must extend to the data-to-action path, not stop at authentication.

Context trust debt: the more sources an agent can consume, the more unreviewed assumptions accumulate in the decision path. This is the named failure mode practitioners should watch. Each additional retrieval source expands the chance that malicious or stale context will be treated as truth, so governance has to focus on provenance, minimization, and continuous review. The conclusion is simple: if context can be swapped, context can be weaponized.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
OWASP Agentic AI Top 10 helps teams map context poisoning, tool misuse, and runtime trust failures to a broader control model.

What this signals

Context trust debt is the new operational risk for agentic programmes. As agents consume more documents, tickets, and tool outputs, every additional source expands the number of assumptions that can be silently poisoned. Teams should respond by treating context provenance as a living control surface, not a one-time design decision.

With only 52% of companies able to track and audit the data their AI agents access, the other half cannot reliably explain why an agent made a decision or where unsafe context entered the path. That gap will matter more as deployment scales and as attackers target retrieval layers rather than models.

Practitioners should align AI governance with runtime identity and access control, especially where context is fed from shared repositories, support systems, or external knowledge bases. For a broader control map, the OWASP Agentic AI Top 10 is a useful companion reference.

For practitioners

Map every runtime context source Inventory the documents, tickets, prompts, APIs, and orchestration inputs that can influence agent behavior, then classify which ones are trusted, untrusted, or conditional. Focus on retrieval paths, not just model endpoints.
Restrict context ingestion by role and purpose Apply role-based controls so only approved users, services, or agents can feed context into production workflows. Separate operational knowledge from unvetted content and require explicit approval for high-risk sources like shared repositories or public inputs.
Sanitize and classify all context before retrieval Remove unnecessary sensitive data such as PII, label source provenance, and prevent low-trust content from entering agent prompts or retrieval stores. Treat context classification as a security control, not a documentation task.
Audit prompt and retrieval histories continuously Keep logs of what was retrieved, when it changed, and which agent consumed it so injection attempts and provenance drift are visible during investigation. Pair monitoring with versioning so you can compare expected and actual context.
Test business logic with adversarial context Red-team agents using plausible but malicious instructions in manuals, tickets, and prompts to see whether the system accepts unsafe guidance. Use those tests to identify where policy-aware orchestration must block or downgrade context.

Key takeaways

Context engineering is a security boundary, because agents can be manipulated through trusted inputs without any model compromise.
Attackers only need to poison a small amount of trusted context to alter agent behavior at scale.
Teams should govern retrieval, prompts, and orchestration with the same discipline they apply to code and credentials.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Context poisoning and tool abuse are core agentic AI threats in this article.
NIST AI RMF		Runtime context governance affects trust, safety, and accountability in AI systems.
NIST CSF 2.0	PR.AC-4	Access control over context sources is central to limiting malicious data paths.

Map retrieval and prompt paths to agentic AI threat controls and block untrusted context sources.

Key terms

Context Engineering: The practice of selecting, curating, and delivering the information an AI system uses at runtime. In agentic environments, context engineering is a security function because the quality, provenance, and trust level of the inputs directly shape the system’s actions and outputs.
Context Poisoning: An attack in which malicious or misleading data is inserted into the sources an AI system trusts. The model may remain unchanged, but the agent’s behavior shifts because it is acting on corrupted context rather than verified information.
Shift Up: An AI security approach that extends protection beyond code into prompts, retrieval pipelines, orchestration, and runtime policy. It recognizes that for AI systems, the most important security controls often sit above the application layer where context is assembled and consumed.
Business Logic Manipulation: A technique that uses plausible-looking instructions or data to push an AI system into unsafe or unauthorized behavior. The attack targets how the system interprets context and applies rules, rather than exploiting a traditional software vulnerability.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by Pillar Security: Securing Context Engineering. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-10.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org