Subscribe to the Non-Human & AI Identity Journal

How should security teams handle untrusted content in AI agent workflows?

Security teams should treat every retrieved page, file, message, or feed as untrusted until it is validated by provenance and policy checks. The control point is not the prompt alone. It is the retrieval, rendering, and tool-calling chain that turns external content into model context and then into action.

Why This Matters for Security Teams

Untrusted content becomes dangerous the moment an AI agent can retrieve it, interpret it, or pass it into a tool call. A poisoned web page, malformed document, deceptive chat message, or compromised feed can influence the agent’s context and change what it does next. That makes content security a workflow problem, not just a prompt-filtering problem. Current guidance from the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework both point toward provenance, validation, and context controls as essential safeguards.

For NHI Management Group, the important distinction is that agent workflows often convert external content into actionable authority. Once retrieved data reaches memory, planners, or tool selectors, the system may treat it as if it were trusted internal input unless policy intervenes. That is why retrieval, rendering, and action need separate controls. The same pattern shows up in real incidents: attackers do not need to break the model first if they can shape the data the agent consumes, as highlighted in NHIMG coverage such as AI LLM hijack breach. In practice, many security teams discover content poisoning only after an agent has already summarized, forwarded, or acted on the malicious material.

How It Works in Practice

Security teams should design controls around the full content path: acquisition, parsing, context injection, reasoning, and tool execution. The practical goal is to prevent untrusted material from silently gaining influence just because it was retrieved by a legitimate agent. That means treating provenance as a first-class security signal and applying policy before content is promoted into model context.

A workable pattern is to validate content at multiple checkpoints:

  • Source trust: confirm where the content came from, who published it, and whether the source is expected for that workflow.

  • Content normalization: strip active content, hidden instructions, and formatting that can manipulate parsers or downstream tools.

  • Context gating: label content by trust level so the agent cannot confuse retrieved text with system policy or operator intent.

  • Tool-call guardrails: require policy checks before any action that writes, sends, deletes, purchases, or escalates access.

  • Output inspection: review whether the agent is quoting, transforming, or acting on untrusted material in ways that violate policy.

This is consistent with the direction of the CSA MAESTRO agentic AI threat modelling framework, which treats agentic workflows as systems with multiple decision points rather than one monolithic prompt. NHIMG’s research on OWASP NHI Top 10 also reinforces that identity, context, and tool access are tightly coupled in agent systems. For organisations handling sensitive content, the safest model is to combine content provenance checks with short-lived credentials and explicit policy evaluation before each meaningful action. These controls tend to break down when agents chain multiple tools across loosely governed SaaS environments because the trust boundary disappears between retrieval and execution.

Common Variations and Edge Cases

Tighter content controls often increase latency and operational overhead, so organisations have to balance safety against workflow speed and analyst productivity. That tradeoff is especially visible in high-volume environments where agents process large document sets, inbound messages, or continuously updated feeds.

There is no universal standard for every content type yet, so guidance is still evolving for some edge cases. For example, a public web page used for research may warrant stronger sandboxing than an internal knowledge article, but both can still be abused if the agent is allowed to treat text as instruction. Similarly, images, PDFs, and HTML can all carry hidden or misleading content that survives naive sanitisation. Security teams should assume that “read-only” content can still become executable influence once it enters retrieval-augmented generation or multi-agent orchestration.

Prioritise the highest-risk cases first: external URLs, user-uploaded files, inbox messages, chat attachments, and feeds from third parties. The State of Non-Human Identity Security shows how often visibility gaps and weak controls create downstream exposure, and that pattern applies equally when untrusted content is routed through an agent with powerful credentials. In practice, teams get into trouble when they only validate the source system and ignore what the content does after ingestion. That gap becomes most dangerous in multi-agent workflows where one agent’s output becomes another agent’s trusted input.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A2 Covers prompt and tool-chain injection through untrusted content.
CSA MAESTRO T2 Models agent workflows as layered trust decisions across content paths.
NIST AI RMF Supports governance and risk controls for AI content ingestion.

Block unsafe content from reaching agent context and require policy checks before tool execution.