What is the difference between AI safety and AI security?

Why This Matters for Security Teams

AI safety and AI security are often discussed together, but they fail in different ways and at different layers. Safety focuses on whether a model behaves as intended, avoids harmful outputs, and stays within acceptable content boundaries. Security focuses on whether the system can be abused, over-permissioned, or coerced into actions it should not perform. That distinction matters once an AI system is connected to tools, data, and production workflows.

When teams blur the two, they may add stronger prompts or content filters while leaving broad API access, weak secrets hygiene, and poor audit trails untouched. That creates a false sense of control. NHIMG research on the LLMjacking threat vector shows how attackers focus on compromised identities and exposed credentials rather than model output alone. In the same way, the Microsoft Azure OpenAI service breach illustrates how operational access can become the real attack surface.

Security teams need to treat safety controls as necessary but incomplete, because a well-behaved model can still be a dangerous workload if it has excessive access. In practice, many security teams encounter AI abuse only after an exposed key, overbroad connector, or agent workflow has already been used to move laterally.

How It Works in Practice

Safety controls answer questions like, "Should the model say or generate this?" Security controls answer, "Should this identity be allowed to do this action in this environment?" For standalone chat systems, safety and security can overlap visibly. For agents, RAG pipelines, and tool-using workflows, they split sharply. An agent may be safe in conversation yet insecure if it can read internal documents, call cloud APIs, or invoke downstream systems without tight authorisation.

Operationally, the best pattern is to separate model guardrails from workload identity and authorisation. Current guidance suggests using policy at request time, short-lived credentials, and explicit audit of every action that crosses a trust boundary. The agent should authenticate as a workload, not as a human, and receive only the minimum capability needed for the task. That aligns with the framing in the Ultimate Guide to NHIs and with external implementation thinking from Anthropic Project Glasswing.

Use safety controls to reduce harmful generation, prompt injection impact, and policy-violating content.

Use security controls to govern identity, secrets, permissions, network reach, and tool invocation.

Issue just-in-time credentials for each task instead of long-lived static secrets.

Log tool calls, data access, and privilege changes separately from model prompts and outputs.

Apply policy-as-code so decisions are evaluated in context, not only at onboarding.

This distinction becomes especially important when the model can act on behalf of a user, since the security failure is usually privilege and reach, not the text the model produced. These controls tend to break down when agents inherit human-grade permissions inside highly connected enterprise environments because the damage path is then broader than any content filter can contain.

Common Variations and Edge Cases

Tighter safety controls often increase friction for users and developers, so organisations must balance reduced harmful output against latency, complexity, and false positives. In practice, there is no universal standard for exactly where safety ends and security begins, especially in multi-agent systems where one agent’s output becomes another agent’s input.

Some edge cases blur the line. Prompt injection is both a safety and security concern because it can manipulate content generation and also redirect tool use. Data leakage sits in the overlap as well: a model may be safe in tone but insecure if retrieval scopes are too broad or if secrets appear in context windows. The CSA MAESTRO agentic AI threat modeling framework is useful here because it pushes teams to map risks across behaviours, tooling, and orchestration rather than treating everything as a prompt problem.

Best practice is evolving, but the practical rule is simple: safety governs what the model should produce, while security governs what the system is allowed to touch. That separation is most reliable when ownership is split clearly between AI governance, application security, and identity teams.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Separates model behaviour risks from tool and agent abuse risks.
CSA MAESTRO		Covers agentic threat modeling across behaviour, orchestration, and access paths.
NIST AI RMF		Supports governance of AI risk beyond content moderation alone.

Map safety controls to output moderation and security controls to agent permissions, secrets, and tool access.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What is the difference between AI safety and AI security?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group