TL;DR: AI agent attack techniques are resurfacing across platforms, with Zenity describing how prompt injection, retrieval poisoning, trusted-domain abuse, and image-rendering tricks reappear in Bing, Microsoft 365 Copilot, Salesforce Einstein, and Agentforce. The pattern shows that agent risks are structural, not one-off vulnerabilities, and demand a dedicated security layer rather than vendor-only fixes.
At a glance
What this is: This analysis shows that the same AI agent attack techniques keep reappearing across platforms, especially through prompt injection, retrieval poisoning, and trusted-resource abuse.
Why it matters: It matters because IAM, NHI, and security teams need controls that assume agent inputs, retrieval paths, and outbound calls can all become attack surfaces.
By the numbers:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%).
- 92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so.
👉 Read Zenity's analysis of recurring AI agent prompt-injection techniques
Context
AI agent prompt injection is a persistence problem, not a one-off exploit. The same techniques can survive platform changes, reappear in new product names, and move from research demonstrations into production workflows whenever agents are allowed to ingest untrusted input, retrieve stored content, and render external resources.
For IAM and NHI teams, the core issue is governance at the agent boundary. Once an AI agent can accept hidden instructions from emails, CRM records, or retrieved context, access control alone is no longer enough because the identity is being manipulated through the data path as much as through the authentication path.
This is not an isolated vendor story. It reflects a broader control gap that shows up whenever organisations treat agent behaviour as a product feature instead of an identity and security problem.
Key questions
Q: How should security teams defend AI agents against prompt injection across different platforms?
A: Treat prompt injection as an agent boundary problem, not a single-product defect. Security teams should classify retrieved content, restrict which sources can influence agent context, and test whether hidden instructions can survive across email, CRM, and document workflows. The goal is to stop untrusted text from becoming actionable intent inside the agent.
Q: Why do AI agent attacks keep reappearing after vendor fixes?
A: They keep reappearing because the underlying primitives stay available: untrusted input, retrieval, prompt manipulation, and automatic actions. When those primitives remain in the architecture, attackers can rebuild the same chain in a different platform with minor changes. Fixing one product does not eliminate the technique family.
Q: What do security teams get wrong about agentic AI prompt attacks?
A: The most common mistake is treating prompt filtering as sufficient. Prompt attacks often succeed through retrieved context, trusted-domain abuse, or automatic resource rendering, which sit outside simple input validation. Teams need to govern the full execution path, not just the first line of text.
Q: How can organisations tell whether AI agents are exposed to covert exfiltration paths?
A: Look for any workflow where the agent can render remote content, call external URLs, or transform retrieved data into network requests. If those steps happen automatically, the agent may have an exfiltration channel even when users never click anything. That signal deserves immediate containment review.
NHI Mgmt Group analysis
Prompt injection is now a repeatable control-break pattern, not an isolated bug class. Zenity’s examples show the same attack logic resurfacing across Bing, Copilot, Salesforce Einstein, and Agentforce with only minor variations. That recurrence matters because it means defenders are facing a technique family that adapts to platform context faster than point fixes can eliminate it. Practitioners should treat prompt injection as an enduring agent-governance problem, not a vendor-specific anomaly.
Trusted-resource rendering creates an identity and exfiltration boundary that traditional IAM does not model. When an agent can render content or follow embedded resource references, it can turn a benign-looking document into an outbound data channel. That failure mode sits outside classic authz thinking because the access decision is made by the agent at runtime, not by a human requestor. The implication is that identity governance for agents must account for action semantics, not just permission assignment.
Retrieval content crafting: the recurring failure mode is allowing stored untrusted text to become executable context. The article’s Salesforce examples show that hidden instructions can sit dormant until a later query activates them. That is a governance problem because the malicious payload survives normal moderation and becomes dangerous only at retrieval time. Practitioners should recognise this as a durable pattern in agentic systems, not a one-off product flaw.
OWASP Agentic AI and MITRE ATLAS are becoming the right abstraction layers for this class of risk. The article does not argue for a single vendor control, and neither should practitioners. What matters is mapping the same attack primitives across platforms so that controls, detection logic, and red-team tests travel with the technique. The field is moving toward technique-based governance, and teams should follow that shift.
Security programmes that stop at prompt filtering will miss the larger attack surface. The examples here show that the problem includes retrieval, trusted domains, automatic rendering, and tool-triggered outbound requests. Filtering the input alone leaves the surrounding execution chain intact. Practitioners should evaluate agents as end-to-end identity actors with multiple exploitable transitions.
From our research:
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation, according to AI Agents: The New Attack Surface report.
- Another finding from the same research shows that 80% of organisations say their AI agents have already acted beyond intended scope, including unauthorised access, sensitive-data sharing, and credential disclosure.
- For a broader control framework, see OWASP Agentic AI Top 10, which helps teams map these attack primitives to governance and testing priorities.
What this signals
Retrieval-time governance is becoming the new control plane for AI agents. When agents can ingest email, CRM records, and other untrusted sources, policy has to follow the context into the model rather than stopping at the perimeter. That makes agent governance a lifecycle issue as much as a security issue, and it belongs alongside OWASP Agentic AI Top 10 style threat modelling.
With 92% of organisations saying AI agent governance is critical but only 44% having policies in place, the gap is not awareness but execution. Teams should expect the next wave of incidents to come from automatic rendering, trusted-domain abuse, and hidden instructions that survive ordinary review. The practical response is to inventory every place an agent can read, render, or call out, then close the paths that can turn context into action.
Prompt-injection resilience will increasingly depend on cross-functional ownership. Security teams, compliance teams, and platform owners need the same visibility into retrieval sources, outbound fetches, and agent tool use. Without that shared view, the organisation will keep discovering the same attack class in a new wrapper.
For practitioners
- Classify retrieval sources before they reach agent context Mark CRM records, email, and other external content as untrusted until they pass sanitisation and trust-scoring checks. Do not allow arbitrary retrieved text to become instruction-bearing context without policy enforcement.
- Constrain automatic rendering and outbound fetches Disable or tightly broker markdown image loading, remote resource calls, and other automatic fetch paths in agent workflows. Require explicit policy gates for any request that leaves the trusted boundary.
- Map controls to agentic attack primitives Use frameworks such as the OWASP Agentic AI Top 10 and the MITRE ATLAS adversarial AI threat matrix to test retrieval poisoning, prompt injection, and tool misuse across environments.
- Test for hidden-in-context persistence Red-team agents with payloads that remain dormant until a later query. Include CRM notes, email bodies, and linked content so you can see whether dormant instructions survive normal processing.
Key takeaways
- AI agent prompt injection persists because attackers keep exploiting the same structural primitives across platforms.
- Trusted-resource rendering and retrieval poisoning create covert data-exfiltration paths that identity controls alone do not stop.
- Practitioners should govern the full agent execution chain, not just the visible prompt or the individual vendor product.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and MITRE ATLAS address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | AGENT-02 | Prompt injection and tool misuse are the article's core techniques. |
| MITRE ATLAS | The article tracks adversarial AI techniques across multiple platforms. | |
| NIST AI RMF | Agent governance and accountability are central to the risk discussed. |
Map recurring attack primitives to ATLAS techniques and use them in red-team and detection planning.
Key terms
- Prompt Injection: Prompt injection is the manipulation of an AI agent by embedding instructions inside content it later reads as context. In agentic systems, the danger is not just bad text but the agent treating attacker-controlled content as part of its operating instructions.
- Retrieval Poisoning: Retrieval poisoning is the insertion of malicious or misleading content into a data source that an AI system later retrieves. For agents, it creates delayed execution risk because the payload can sit quietly until a user query causes the model to act on it.
- Trusted-Domain Abuse: Trusted-domain abuse occurs when attackers exploit whitelisted or reputable domains to carry malicious payloads or exfiltration traffic. In AI agent workflows, it is especially dangerous because reputation-based controls may allow the request through without sufficient scrutiny.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by Zenity: 0Click Attacks, When TTPs Resurface Across Platforms. Read the original.
Published by the NHIMG editorial team on 2025-09-29.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org