TL;DR: Prompt injection can turn an AI agent into an exfiltration path even when the attacker never steals a victim credential, as shown in Beyond Identity's analysis of the Claude Cowork attack. The real break point is not access speed but whether AI workloads have hardware-bound identity, posture checks, and per-request proof of origin.
At a glance
What this is: Beyond Identity argues that AI agents are being attacked through prompt injection and API-key abuse, exposing a gap in how enterprises govern autonomous tool access.
Why it matters: For IAM and NHI teams, the issue is that agent permissions, tool access, and secrets handling now need continuous controls, not static trust assumptions.
👉 Read Beyond Identity's analysis of the Claude Cowork prompt injection attack
Context
AI agent identity risk is emerging because agents can act on behalf of users while inheriting broad access to files, tools, and external services. That creates a governance gap for NHI and IAM teams, since conventional controls were built around humans, service accounts, and discrete API calls rather than autonomous software that can be steered by hidden prompts.
The article centers on a prompt injection attack against Claude Cowork, where a malicious instruction inside a file caused the agent to exfiltrate confidential documents using an attacker-supplied API key. That pattern is no longer edge-case behavior. It is a practical illustration of why agent identity, not just model safety, now belongs in the NHI control plane.
The starting position is increasingly typical: teams assume the agent can be trusted because the environment is sandboxed, the tool is approved, or the key is “only a string.” In practice, that assumption breaks as soon as the agent can be induced to execute actions that exceed the user’s intent.
Key questions
Q: How should security teams govern AI agents that can use external tools?
A: Security teams should treat tool access as privileged, policy-governed execution. Each agent should have a minimal, explicit tool list, separate approval for high-risk actions, and continuous checks on runtime posture. If the agent can read files, upload data, or call APIs, those permissions must be reviewed like any other NHI entitlement.
Q: What is the difference between API-key security and hardware-bound identity for AI agents?
A: API-key security depends on possession of a reusable secret, which makes replay and copying easy. Hardware-bound identity ties the private key to the device or runtime, so the credential cannot simply be pasted into a prompt or moved to another system. That distinction matters because AI agents can be steered by untrusted content.
Q: When does prompt injection become an NHI governance issue?
A: Prompt injection becomes an NHI governance issue when the agent can turn hidden instructions into real actions, such as file access, tool calls, or data exfiltration. At that point the problem is no longer model output quality. It is unauthorized execution under a delegated identity that lacks proper controls.
Q: Why do AI agents complicate zero trust architecture?
A: AI agents complicate Zero Trust Architecture because they can receive dynamic instructions, use multiple tools, and change context within a single workflow. Zero trust expects continuous verification, but many agent deployments still rely on one-time login or static API keys. That gap leaves room for action after authentication.
Technical breakdown
Prompt injection turns trusted context into an execution channel
Prompt injection works because the attacker places instructions in content the agent is expected to process, such as files, webpages, or retrieved documents. The model does not distinguish between user intent and hidden adversarial text unless the surrounding system imposes strict boundaries. Once the agent accepts the malicious instruction, it may call tools, move data, or prepare a payload that never came from the operator. In NHI terms, the agent behaves like a workload with delegated authority but no native ability to verify the legitimacy of each instruction. Practical implication: treat every untrusted input as a potential control input, not just data.
Practical implication: isolate agent-readable content from agent-executable commands and require policy checks before tool use.
Why API keys fail as agent identity
API keys are bearer secrets, which means possession is effectively authorization. For autonomous agents, that is too weak because the key can be copied into a prompt, embedded in a file, or reused outside the runtime that was supposed to own it. Hardware-bound identity changes the trust model by tying the private key to a device, VM, container, or secure enclave so it cannot be moved as a string. That gives the service a way to verify origin, not just possession. Practical implication: replace shared or portable secrets with device-bound cryptographic proof.
Practical implication: use workload-bound credentials so a stolen token cannot be replayed from an untrusted context.
Posture and provenance are the missing controls for agentic access
Agentic systems need more than authentication. They need continuous posture verification, policy evaluation on each request, and provenance for every action and artifact. Posture tells you whether the runtime is still in an approved state. Policy tells you whether the agent may call a tool, access a directory, or upload a file under the current conditions. Provenance creates an audit trail that links the output back to the initiating user, runtime, tool chain, and policy state. Practical implication: govern agent access as a live transaction, not a one-time login event.
Practical implication: enforce continuous authorization and retain cryptographic lineage for all high-risk agent actions.
Threat narrative
Attacker objective: The attacker’s objective is to use the victim’s agent as a trusted execution path for silent data theft.
- Entry occurred when the attacker embedded malicious instructions in a file that the victim asked the agent to process.
- Escalation followed when the agent accepted the hidden prompt and executed tool use on the attacker’s behalf.
- Impact was exfiltration of confidential documents to the attacker’s own account without the victim approving the transfer.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
AI agent identity is now a governance problem, not a model-safety side topic. Once an autonomous system can call tools, move files, and interact with external APIs, identity becomes the control that determines whether those actions are legitimate. Traditional IAM can still authenticate the surrounding user or service, but it does not inherently bind intent, runtime posture, and tool scope to the agent’s own behavior. Practitioners should treat agent identity as a first-class NHI issue.
API-key-centric AI security creates ephemeral credential trust debt. The more teams allow agents to operate with portable bearer tokens, the more they inherit a replay problem that cannot be solved by prompt filtering alone. A credential that can be copied into a prompt is already outside its design boundary. Practitioners should reduce that debt by moving toward hardware-bound, workload-bound credentials and per-request proof.
Prompt injection is the mechanism, but tool authority is the real blast radius. Hidden instructions matter because they can redirect a legitimate agent into file access, network egress, or privileged API calls. The governance failure is not that the model was tricked in the abstract. It is that the environment allowed the trick to become an action with consequences. Practitioners should map every reachable tool to an explicit access decision.
Cryptographic provenance will become a baseline requirement for defensible AI operations. Without lineage for prompts, tools, runtime, and outputs, investigations will remain speculative after an incident. Provenance also supports policy enforcement because it makes it possible to ask which agent acted, from where, and under what controls. Practitioners should expect auditability to move from optional feature to core control.
AI agent governance will converge with NHI lifecycle management. Agent onboarding, key issuance, posture attestation, privilege review, and offboarding are lifecycle problems, not one-off configuration tasks. The field needs tighter alignment between IAM, PAM, and NHI operations because agents accumulate access faster than current review cycles can absorb. Practitioners should build lifecycle controls before agent populations outgrow visibility.
From our research:
- 97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface, according to the Ultimate Guide to NHIs.
- Only 5.7% of organisations have full visibility into their service accounts, which is why agent inventories and ownership mapping remain foundational controls.
- Start with OWASP NHI Top 10 if you need a structured way to evaluate prompt injection, tool misuse, and agent exposure.
What this signals
Ephemeral credential trust debt is now a practical planning concept for AI programmes. Each time an organisation allows an agent to operate with a copied API key or reusable token, it adds trust that cannot be verified at runtime. That debt grows faster than most review cycles can clear, so the right response is to move key issuance, rotation, and offboarding into the same operating model used for other NHIs.
With 96% of organisations storing secrets outside secrets managers in vulnerable locations, the AI agent problem will usually begin with existing secret hygiene, not exotic exploitation. That means teams should expect prompt injection to find a weak credential path somewhere in the workflow and should harden the entire access chain, not only the model endpoint.
Identity-first controls align naturally with the NIST AI Risk Management Framework and the OWASP Agentic AI Top 10. Practitioners should use those references to translate model risk into concrete entitlement, posture, and audit requirements for agentic systems.
For practitioners
- Inventory every agent runtime and its reachable tools Map each AI agent to the files, shells, APIs, and MCP servers it can reach. Remove any tool that is not essential to the task and document the approved scope for each runtime.
- Replace shared secrets with hardware-bound workload credentials Bind agent identity to the device, VM, container, or secure enclave so the credential cannot be copied into prompts or reused elsewhere. Prefer per-request cryptographic proof over portable bearer tokens.
- Enforce policy on every high-risk agent request Require explicit approval for file uploads, outbound network calls, and cross-domain data movement. Treat these as privileged actions that need live authorization rather than inherited trust.
- Add continuous posture checks to agent runtimes Validate runtime integrity, image provenance, and sandbox settings before and during execution. Revoke or downgrade access when the agent drifts from its approved posture.
- Record provenance for prompts, tools, and outputs Keep an auditable chain linking the initiating user, the agent runtime, the tools used, and the resulting artifacts. Use that lineage to support incident review and access recertification.
Key takeaways
- AI agents now create NHI risk because they can translate hidden prompts into real tool actions.
- Bearer-token security is too weak for autonomous workflows because copied credentials can be replayed outside the intended runtime.
- The right control model combines hardware-bound identity, continuous posture checks, and provenance for every agent action.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Prompt injection and tool misuse are central to this agentic AI risk. | |
| NIST AI RMF | AI RMF addresses governance, accountability, and operational controls for AI systems. | |
| NIST Zero Trust (SP 800-207) | PR.AC-4 | Continuous verification and least privilege fit agent runtime access decisions. |
Apply continuous authorization so agent access is checked during each request, not only at login.
Key terms
- Hardware-Bound Identity: A credential model that ties an identity to a specific device, virtual machine, container, or secure enclave. The private key never leaves that trusted boundary, which makes the identity much harder to copy, replay, or reuse in an untrusted runtime.
- Prompt Injection: A technique where hidden or malicious instructions are embedded in content that an AI system processes as input. For agents, the danger is not just incorrect output. The danger is that the model may convert those instructions into real actions through tools or data access.
- Cryptographic Provenance: An auditable record that links an AI action or output to the initiating user, the runtime, the tools used, and the policy in force. It helps teams investigate what happened after an incident and verify that an action was executed under approved conditions.
- Ephemeral Credential Trust Debt: The accumulated risk created when organisations keep giving agents short-lived or copied secrets without binding those credentials to a runtime. Each new token may appear temporary, but the operational trust assumptions often persist longer than the credential itself.
Deepen your knowledge
AI agent identity risk and hardware-bound credential design are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are formalising controls for autonomous workflows, it is worth exploring.
This post draws on content published by Beyond Identity: The attacker gave Claude their API key and why AI agents need hardware-bound identity. Read the original.
Published by the NHIMG editorial team on 2026-02-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org