TL;DR: Prompt injection attacks exploit how large language models blur the line between trusted instructions and untrusted input, and when agents can call APIs or modify systems, the result becomes execution-layer compromise rather than bad text output, according to Keyfactor. The real failure is that conventional controls assume semantic intent can be filtered after the fact, but agentic systems can act before that boundary is validated.
NHIMG editorial — based on content published by Keyfactor: Prompt Signing, How Prompt Injection Attacks Work
Questions worth separating out
Q: How should security teams prevent prompt injection from triggering AI agent actions?
A: Security teams should separate untrusted text from executable instructions, then require a policy check before any agent can call tools or modify systems.
Q: Why is prompt injection a governance issue for IAM teams?
A: Prompt injection becomes an IAM issue when AI agents hold credentials, access APIs, or operate on enterprise data.
Q: When do signed prompts still leave organisations exposed?
A: Signed prompts still leave organisations exposed when replay is possible or when the signing party is allowed to authorize actions outside the intended scope.
Practitioner guidance
- Separate instruction channels from user content Keep system instructions, retrieved data, and user-provided text in distinct trust domains so the model never has to infer which text is executable.
- Require directive signing for privileged agent actions Use cryptographic signatures for prompts that can trigger tool use, API calls, or configuration changes.
- Add freshness checks to prevent replay Set a recency threshold for one-time or high-risk directives such as certificate enrollment, record deletion, or infrastructure changes.
What's in the full article
Keyfactor's full article covers the operational detail this post intentionally leaves for the source:
- Cryptographic signing workflow details for AI directives, including how private keys stay inside the signing infrastructure.
- Pre-launch verification steps for agentic workloads, including signature, certificate chain, and timestamp freshness checks.
- The replay-attack example for certificate enrollment and why recency thresholds matter for one-time operations.
- How the container-based verification flow blocks unsigned or stale directives before execution.
👉 Read Keyfactor's analysis of how prompt injection attacks work →
Prompt injection in AI agents: are your controls keeping up?
Explore further
Prompt injection is an execution-layer identity problem, not a content-moderation problem. The article makes clear that once an AI agent can call tools, the security boundary moves from text filtering to permissioned action. That means the governing question is no longer whether the model produced harmful output, but whether an untrusted instruction path was allowed to trigger enterprise action. Practitioners should treat agent execution rights as the control surface.
A few things that frame the scale:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: What is the difference between prompt signing and prompt filtering?
A: Prompt signing proves a directive came from an approved source and was not changed. Prompt filtering tries to block suspicious text patterns after the fact. Signing is a provenance and authorization control. Filtering is a content control, and content controls do not reliably stop semantic attacks in agentic systems.
👉 Read our full editorial: Prompt injection attacks expose the execution layer in agentic AI