Subscribe to the Non-Human & AI Identity Journal

What breaks when AI security stops at model scanning?

Model scanning helps identify tampering and unsafe dependencies before deployment, but it does not address runtime misuse. Once the system is live, prompt injection, unsafe tool use, and manipulated responses can still drive harmful behaviour. Without runtime controls, the most important security decisions happen after the pre-check has already passed.

Why This Matters for Security Teams

Model scanning is useful, but it only proves that a model, dependency, or artifact looked acceptable before release. It does not answer the operational question that now matters most: what happens when an AI system starts receiving adversarial prompts, chaining tools, or returning manipulated outputs in production. For agentic and tool-using systems, the security boundary moves from static artefacts to runtime behaviour.

This is why current guidance increasingly treats pre-deployment review as necessary but insufficient. A scan can flag known tampering or risky packages, yet it cannot constrain a live agent that can call APIs, browse data, or take follow-on actions after a user or attacker changes the context. NHI Management Group research on the DeepSeek breach is a useful reminder that the failure mode is often not the model file itself, but the surrounding identity, access, and runtime control plane. In practice, many security teams discover prompt injection and tool abuse only after a live workflow has already been trusted and used.

The industry is also still converging on how to govern these systems. The Anthropic Project Glasswing work reinforces the broader point: safety controls must follow the system into execution, not stop at inspection.

How It Works in Practice

Runtime security for AI systems starts with the assumption that the model can be steered after deployment. That means policy has to evaluate the request, the tool, the user context, and the current state of the workload before any action is approved. Static allowlists and pre-approved model scans are helpful, but they do not stop a malicious prompt from redirecting a legitimate workflow.

Practitioners are increasingly using a layered control plane that combines workload identity, short-lived credentials, and policy-as-code. The goal is to give the AI system only the access needed for the current task, then revoke it immediately after use. In agentic environments, this often means JIT credentials, scoped API tokens, and explicit approval gates for sensitive actions such as sending data, changing records, or invoking external tools.

That runtime model aligns with emerging guidance from CSA MAESTRO agentic AI threat modeling framework, which emphasizes control over the agent lifecycle rather than just model integrity. It also fits the NHI reality described in The State of Non-Human Identity Security, where weak rotation, poor visibility, and over-privileged accounts are common attack drivers.

  • Use workload identity to establish what the agent is, not just what it was granted once.
  • Issue dynamic secrets per task instead of relying on long-lived credentials.
  • Evaluate each tool call at runtime with current context and least privilege.
  • Log prompt, tool, and decision paths so misuse can be investigated after the fact.

These controls tend to break down when agents are given broad network reach and persistent tokens, because the system can quietly chain one approved step into many unreviewed actions.

Common Variations and Edge Cases

Tighter runtime control often increases latency, engineering overhead, and governance friction, so organisations have to balance safety against operational throughput. That tradeoff is especially visible in multi-agent workflows, where one agent may depend on another agent’s output and a simple block can halt the whole pipeline.

There is no universal standard for this yet. Some teams use hard approval gates for high-risk actions, while others prefer policy scoring and step-up checks only when confidence drops. Best practice is evolving toward contextual authorization, but current guidance suggests that no single scanner can replace live enforcement for tool-using systems.

Edge cases matter. A model that is safe in a sandbox can become risky when connected to internal data, external SaaS, or privileged APIs. The same is true when vendors hide the runtime chain behind a single product label. In those environments, scanning the model artifact gives a false sense of coverage because the real attack surface is the orchestration layer, not the weights alone. The State of Secrets in AppSec is a useful parallel here: a strong pre-check means little if secrets and access paths remain exposed during execution.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 AI-04 Runtime prompt injection and tool abuse are core agentic AI threats.
CSA MAESTRO MAESTRO focuses on lifecycle threat modeling for agentic systems.
NIST AI RMF AI RMF applies governance to runtime AI risk, not just model artifacts.

Add request-time controls for prompts, tools, and outputs instead of relying on pre-release scans.