Subscribe to the Non-Human & AI Identity Journal

What breaks when AI security is treated only as model security?

Model-only security misses the part of the system that actually touches tools, data, and workflows in production. A secure model can still produce unsafe outcomes if the surrounding agent, connectors, or permissions are not governed. Practitioners need controls that follow the operational identity, not just the model artefact.

Why Treating AI Security as Model Security Breaks Down

Model security matters, but it is only one layer of a larger operational system. A well-tuned model can still drive unsafe outcomes if an agent has broad tool access, weak connector governance, or over-permissive secrets. The real risk sits in the path from prompt to action: retrieval, APIs, execution rights, and downstream workflows. That is why NHI Management Group treats the operational identity as the control point, not just the model artefact.

This is where incident patterns start to diverge from traditional ML risk. Attackers do not need to poison the model if they can hijack the identity that calls it, steal a token, or abuse a connector. The LLMjacking research shows how compromised non-human identities can become the entry point for AI abuse, while Anthropic Project Glasswing illustrates how agentic systems expand the attack surface beyond the model itself. In practice, many security teams encounter this only after an exposed credential or over-broad tool permission has already been used in production.

How the Control Gap Shows Up in Production

Model-only controls usually focus on weights, training data, or prompt filtering. Those are necessary, but they do not govern what happens when an AI agent can read mail, query databases, invoke payment tools, or trigger automation. The control problem is runtime authorisation, not just model integrity. For autonomous systems, static RBAC is often too blunt because the agent’s actions depend on context, task state, and live policy conditions.

Current guidance suggests shifting to workload identity and just-in-time privileges. A secure design gives the agent a cryptographic identity, short-lived credentials, and request-time policy checks tied to the task being performed. That approach aligns with the direction of the CSA MAESTRO agentic AI threat modeling framework, which treats tool use, memory, orchestration, and external dependencies as first-class security concerns. It also reflects NHIMG’s view in the DeepSeek breach analysis that exposure often emerges from the surrounding system, not the model alone.

  • Use workload identity for the agent, not shared service accounts.
  • Issue JIT credentials per task, with short TTLs and automated revocation.
  • Evaluate policy at request time against tool, data, and workflow context.
  • Scope secrets to the minimum connector or action required.

When these controls are in place, the model can still be strong while the production system becomes governable. These controls tend to break down when legacy automation, human and agent workflows, and long-lived integration tokens are blended into the same execution path because attribution and revocation become ambiguous.

Where the Model-Security Mentality Misses Real-World Edge Cases

Tighter control often increases integration overhead, requiring organisations to balance safer runtime governance against delivery speed and operational complexity. That tradeoff is real, especially where teams are retrofitting AI into existing business processes rather than designing agentic controls from scratch.

There is no universal standard for this yet, but best practice is evolving toward intent-based authorisation and continuous verification. A model may be locked down, yet the agent can still chain tools, exfiltrate data through legitimate APIs, or cause harm through an approved workflow. This is why security teams should not stop at model filters, red-team prompts, or training data hygiene. Those measures reduce one class of failure, but they do not address compromised NHIs, connector abuse, or escalation through delegated permissions.

The most dangerous edge cases are hybrid environments: a human approves a request, an agent executes it, and a backend connector carries standing privileges far beyond what either intended. In those environments, model security is necessary but insufficient, because the failure mode is operational, not statistical.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A03 Model-only security misses agent tool abuse and runtime misuse.
CSA MAESTRO TA-03 MAESTRO covers orchestration, tools, and memory beyond the model.
NIST AI RMF AI RMF addresses governance of the full AI system, not only the model.

Constrain agent tool access, monitor actions, and validate every external call at runtime.