AI tool poisoning shows how hidden instructions can hijack agents

By NHI Mgmt Group Editorial TeamPublished 2026-05-22Domain: Agentic AI & NHIsSource: CrowdStrike

TL;DR: AI tool poisoning occurs when malicious instructions are hidden in tool descriptions or schemas used by AI agents, allowing data theft or unauthorized actions through Model Context Protocol or direct agent integration, according to CrowdStrike. The security problem is not the tool alone but the trust placed in agent-readable metadata and runtime authority.

At a glance

What this is: This is an analysis of AI tool poisoning, where hidden instructions in tool metadata can steer AI agents into leaking data or taking unauthorized actions.

Why it matters: It matters because IAM and NHI controls must now govern not only credentials and APIs, but also the descriptions, schemas, and execution paths that AI agents consume.

👉 Read CrowdStrike's analysis of AI tool poisoning and hidden instructions

Context

AI tool poisoning is a governance failure in which an agent trusts tool metadata that an attacker can manipulate. For IAM and NHI teams, the problem is less about the model itself and more about who can publish tools, what the agent is allowed to execute, and how much authority those tools inherit.

CrowdStrike's example is typical of a broader agentic AI risk pattern: malicious instructions embedded in metadata can redirect legitimate automation into exfiltration or privilege escalation. That makes agent identity, tool trust, and runtime authorization part of the same control plane.

Key questions

Q: How should security teams reduce the risk of AI tool poisoning?

A: Security teams should treat tool metadata as part of the trust boundary. Validate descriptions, examples, and schemas before onboarding, restrict tools to least privilege, and enforce runtime policy checks on sensitive actions. That combination reduces the chance that hidden instructions can redirect an agent into exposing secrets or performing unauthorized work.

Q: Why do AI agents need special governance compared with normal applications?

A: AI agents make decisions about which tools to use and how to use them, so they can be manipulated by malicious context as well as code. That creates an NHI risk because the agent itself has delegated execution authority. Governance must cover identity, metadata trust, and action policy, not only authentication.

Q: What is the difference between prompt injection and tool poisoning?

A: Prompt injection targets the model's instructions, while tool poisoning targets the metadata the agent uses to reason about tools. Both can alter behaviour, but tool poisoning is especially dangerous when the agent trusts external tool descriptions or schemas. Teams should defend both layers because either one can lead to unauthorized actions.

Q: When do AI agents become a privileged access risk?

A: AI agents become a privileged access risk when they can reach secrets, production systems, or administrative APIs without narrow, context-specific limits. At that point, a poisoned tool or misleading schema can turn a normal session into a high-impact misuse path. Teams should review agent permissions the same way they review other high-risk NHIs.

Technical breakdown

How hidden instructions in tool metadata steer agent behaviour

Tool poisoning exploits the fact that many agents treat tool descriptions, examples, and schemas as guidance for reasoning, not just documentation. If an attacker can influence those fields, the agent may incorporate malicious instructions while constructing parameters or choosing a tool. In MCP-driven environments, the attack surface expands because the agent may consume many external tools with varying trust levels. The failure mode is not code execution inside the tool itself. It is the agent's decision to trust metadata that should have been treated as untrusted input.

Practical implication: treat tool descriptions and schemas as security-sensitive inputs and validate them before an agent is allowed to use them.

Why permissive schemas and over-broad tool scope raise NHI risk

A permissive schema gives an attacker room to push unsafe values, while an over-broad tool scope lets an agent reach data or functions it does not need. Together, those conditions create an identity and authorization problem, not just an application bug. The agent effectively becomes a non-human identity with delegated power, so every tool call should be constrained by purpose, context, and least privilege. Without those limits, a single poisoned tool can turn normal automation into a data-exposure path.

Practical implication: bind each tool to narrow scopes, explicit purpose limits, and separate privileges for read, write, and administrative actions.

Runtime monitoring is the last line of defence, not the primary control

Static review helps, but tool poisoning is ultimately a runtime behaviour problem because the malicious effect appears when the agent reasons over the poisoned metadata. Monitoring needs to observe the tool selection, parameter construction, and downstream actions, then compare them to policy. This is where agent governance overlaps with ZTA and NHI controls. A secure design assumes the agent will sometimes encounter hostile context and therefore enforces continuous verification around every high-risk action.

Practical implication: combine runtime policy checks with logging and approval gates for sensitive tool calls, especially where agents can touch secrets or production systems.

Threat narrative

Attacker objective: The attacker wants the agent to execute trusted-looking tool calls that reveal sensitive data or carry out unauthorized operations without touching the tool code directly.

Entry occurs when an attacker publishes or modifies a tool description, example, or schema that an agent can consume through MCP or a similar integration path.
Escalation follows when the agent interprets hidden instructions as legitimate guidance and constructs a call that exposes files, secrets, or privileged parameters.
Impact is achieved when the agent leaks data or performs an unauthorized action on behalf of the attacker.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI tool poisoning is an identity problem before it is a model problem. The attack succeeds because the agent is granted execution authority and then consumes untrusted metadata as if it were policy. That means tool trust, publisher trust, and runtime authority all need explicit governance. Practitioners should stop treating tool catalogs as harmless configuration and start treating them as part of the NHI attack surface.

Ephemeral agent sessions do not remove metadata risk: short-lived credentials reduce dwell time, but they do not prevent an agent from being misled during the session. Hidden instructions can still steer a valid token toward the wrong action. The right control objective is not only session duration, but also whether the session is allowed to interpret untrusted tool content. Practitioners should pair short-lived access with content validation and action policy.

Tool schema abuse creates identity blast radius. When a single poisoned tool can trigger reads, writes, or administrative actions, the blast radius is determined by the agent's scopes, not the model's intent. That is why least privilege for agents must include tool-level scoping, not just API authentication. Practitioners should map every tool to a narrowly defined business purpose and separate high-risk actions from routine automation.

Agentic governance needs a metadata trust boundary. Most current controls focus on secrets, prompts, and endpoints, but tool descriptions and examples are now equally sensitive. That expands the governance boundary into supply chain review, tool publisher vetting, and continuous runtime inspection. Practitioners should add metadata review to the same control stack used for secrets and access approvals.

OWASP-style NHI controls remain relevant, but they are incomplete without agent context. Rotation, inventory, and least privilege still matter, yet they do not address how an agent decides what to do with a trusted credential. This is the category's next control gap. Practitioners should align NHI governance with agent behaviour monitoring, or they will secure the credential while missing the misuse path.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, showing that control design still depends on inconsistent human behaviour.
For a broader governance view, see NHI Lifecycle Management Guide for provisioning, rotation, and offboarding patterns that reduce long-lived exposure.

What this signals

Tool poisoning pushes agent governance beyond classic secrets management. If a system can be tricked through tool metadata, then the control boundary includes publishers, schemas, examples, and runtime decision points. That is a material shift for programmes built around static credential protection, and it argues for tighter review of the agent tool supply chain and associated policy checks.

With 43% of security professionals already concerned about AI systems learning and reproducing sensitive information patterns from codebases, the problem is not hypothetical. The governance response should align agent behaviour monitoring with NIST Cybersecurity Framework 2.0 functions for protect and detect, then map high-risk tools to explicit approval paths.

Metadata trust boundary: this is the practical concept teams should adopt now. If a tool description can change agent behaviour, then metadata needs the same scrutiny as secrets and API scopes. Programmes that extend NHI inventory and review to MCP-connected tools will be better positioned to catch abuse before it becomes an incident.

For practitioners

Vet tool metadata before agent onboarding Review descriptions, examples, and schemas as untrusted input. Reject tools that contain hidden instructions, permissive fields, or ambiguous action scope, and require approval for any tool that can reach secrets or production systems.
Bind agent tools to least-privilege scopes Separate read, write, and administrative permissions so a single tool cannot escalate across functions. Limit each agent to the smallest action set needed for the workflow and isolate high-risk tools from routine automation.
Add runtime policy checks for sensitive tool calls Inspect tool selection and parameter construction at execution time, then block actions that violate policy, touch secrets, or deviate from expected context. Keep immutable logs for later review and incident response.
Monitor for tool poisoning indicators in MCP pipelines Look for sudden changes in tool descriptions, unusual example values, and calls that request files, tokens, or privileged parameters unrelated to the task. Treat those signals as potential compromise, not user error.

Key takeaways

AI tool poisoning is a governance issue because agents can be manipulated through trusted-looking metadata, not only through code or credentials.
Least privilege must extend to tools, schemas, and action scopes, or one poisoned integration can widen the blast radius of an entire agent.
Runtime policy enforcement and metadata review are now core controls for organisations that let agents touch secrets or production systems.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-07	Tool poisoning maps to malicious tool use and agent hijacking risks.
NIST CSF 2.0	PR.AC-4	Least-privilege access is central when agents can execute tools on behalf of users.
NIST Zero Trust (SP 800-207)	PR.AC-5	Continuous verification fits runtime checks for high-risk agent actions.

Apply continuous authorization checks before agents can invoke sensitive tools or access secrets.

Key terms

Tool Poisoning: Tool poisoning is an attack in which malicious instructions are hidden inside tool descriptions, examples, or schemas that an AI agent reads when deciding what to do. The danger is not only in the tool's code, but in the metadata that shapes the agent's behaviour and trust decisions.
Model Context Protocol: Model Context Protocol is an open protocol used to connect AI agents to external tools and data sources. In security terms, it widens the trust boundary because the agent can consume many third-party capabilities, so each connection must be treated as an identity and authorization decision.
Metadata Trust Boundary: A metadata trust boundary is the line between tool content that can be safely consumed and tool content that must be validated before use. For agentic systems, descriptions, examples, and schemas are security-relevant inputs because they can influence decisions and trigger actions with real-world impact.
Agent Blast Radius: Agent blast radius is the amount of damage an autonomous system can cause if it is misled, compromised, or over-permissioned. It is determined by the tools, data, and administrative paths the agent can reach, so reducing it requires narrow scopes and runtime controls.

Deepen your knowledge

AI tool poisoning and agent metadata governance are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for MCP-connected agents or other autonomous workflows, it is worth exploring.

This post draws on content published by CrowdStrike: AI Tool Poisoning: How Hidden Instructions Threaten AI Agents. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-22.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org