What breaks when untrusted content can influence agent decisions?

Why This Matters for Security Teams

When untrusted content can shape an agent’s next action, the old boundary between “input” and “execution” stops being reliable. A prompt, ticket, email, document, or web page is no longer just data if the agent can reinterpret it and invoke tools with real authority. That is the core failure mode behind indirect prompt injection, tool abuse, and source-to-sink compromise. Current guidance from the OWASP Agentic AI Top 10 and NIST AI governance work treats this as an execution-risk problem, not a content-filtering problem.

NHI Management Group’s research shows why the stakes are so high: 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, and 97% of NHIs carry excessive privileges. Once an agent is allowed to browse, summarize, plan, and act, those privileges can be chained in ways a human reviewer would not anticipate. The relevant question is not whether the content is “malicious” in a human sense, but whether it can alter the agent’s decision path toward an unsafe sink. In practice, many security teams discover this only after an agent has already turned a benign-looking input into an approved privileged action, rather than through intentional design of the control path.

How It Works in Practice

The practical fix is to treat agent decisions as a source-to-sink pipeline. “Source” includes every untrusted channel the agent can ingest, such as customer text, web pages, retrieved documents, issue comments, or chat history. “Sink” includes any privileged outcome, including sending email, changing records, deploying code, calling APIs, or retrieving secrets. If a source can influence a sink without policy checks, the system is exposed.

Security teams should separate three layers: content handling, decision making, and execution. Content handling can sanitize or classify inputs, but it cannot be the only defense. Decision making should use runtime policy evaluation, not static IAM assumptions, because agents do not follow fixed access patterns. That is why the combination of NIST AI Risk Management Framework and agent-specific guidance like CSA MAESTRO agentic AI threat modeling framework is useful: both push teams toward context-aware controls at the moment of action.

Classify every inbound source by trust level before it reaches planning or tool selection.

Allow only explicit, narrow tool scopes for a given task, and revoke them as soon as the task ends.

Use workload identity for the agent itself, then evaluate policy at request time rather than relying on a standing role.

Log the full source-to-sink chain so reviewers can see which input influenced which execution.

NHI Management Group’s broader guidance on the Ultimate Guide to NHIs — 2025 Outlook and Predictions reinforces the same point: long-lived credentials and broad entitlements are the wrong shape for autonomous systems. These controls tend to break down when the agent can chain multiple tools across loosely governed systems because the policy boundary is no longer where the developer expected it to be.

Common Variations and Edge Cases

Tighter source-to-sink control often increases operational friction, requiring organisations to balance safety against latency, developer productivity, and task completion rates. That tradeoff is real, especially in agentic workflows that depend on retrieval, automation, and external APIs. Best practice is evolving, and there is no universal standard for how aggressively to block or inspect every untrusted source.

One common edge case is retrieval-augmented workflows. A document can be “trusted” for search purposes yet still be unsafe for instruction-following if it contains embedded prompt injection. Another is multi-agent orchestration, where one agent produces intermediate output that becomes another agent’s source. In those environments, source-to-sink analysis must extend across the full workflow, not just the initial user message. This is where the OWASP NHI Top 10 and the Anthropic AI-orchestrated cyber espionage report are especially relevant, because they show how adversaries exploit the gap between what the model reads and what the system executes.

Another important exception is fully autonomous remediation agents. In those systems, blocking all uncertain input can make the agent unusable, so teams often adopt step-up approval, JIT credentials, or human-in-the-loop review for high-impact sinks. The right answer depends on blast radius and reversibility, not on whether the content looks suspicious at first glance.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A3	Directly addresses prompt injection and tool abuse from untrusted content.
CSA MAESTRO	M3	Covers runtime governance for agent decisions and action boundaries.
NIST AI RMF		Supports risk-based control design for unpredictable model behaviour.

Map every source-to-sink path and block untrusted inputs from influencing privileged tool calls.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What breaks when untrusted content can influence agent decisions?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group