Organisations should require hard boundaries between data and instructions, then restrict the assistant to the minimum set of actions needed for the workflow. They should also monitor tool calls and redact hostile or user-controlled text before the model sees it. That combination reduces the chance that poisoned content becomes operational action.
Why This Matters for Security Teams
Before an AI assistant can touch production systems, the core question is not simply whether it has a prompt filter or a user approval step. The real issue is whether the assistant can be trusted to separate instructions from data, stay within a narrow action scope, and resist being steered by content it encountered along the way. That is why this topic sits at the intersection of workload identity, secret handling, and AI governance rather than classic chatbot security. Guidance from the NIST Cybersecurity Framework 2.0 still applies, but AI assistants add a new operational problem: they can transform untrusted text into real-world actions through tools, APIs, and delegated permissions. The practical risk is easy to underestimate. A model that can read tickets, emails, logs, or documents may also be exposed to injected instructions that try to override policy, expand scope, or exfiltrate secrets. NHIMG research on the DeepSeek breach shows how quickly sensitive material can escape once boundaries are weak, and that pattern matters even more when an assistant has operational authority. In practice, many security teams encounter unsafe tool use only after the assistant has already executed a harmful action, rather than through intentional pre-deployment testing.How It Works in Practice
The safest deployment pattern is to treat the assistant as an untrusted workload until it proves otherwise. That means assigning a distinct workload identity to the agent, issuing ephemeral credentials per task, and forcing every tool call through real-time policy checks. Current guidance suggests that the model should not inherit broad human permissions, because autonomous systems do not behave like static roles. Their actions are goal-driven, context-sensitive, and often hard to predict in advance.- Use hard separation between instructions, retrieved content, and tool arguments so hostile text cannot masquerade as policy.
- Apply least privilege to every connected system, then narrow it further with just-in-time access that expires after the task.
- Evaluate tool requests at runtime with policy-as-code, rather than relying only on pre-approved roles or static allow lists.
- Log every tool invocation, input source, and approval decision so later review can reconstruct what the assistant actually did.
- Strip or neutralize user-controlled text before it reaches the model when that text could influence actions or retrieval.
Common Variations and Edge Cases
Tighter control over an AI assistant often increases friction, latency, and integration effort, so organisations have to balance operational speed against the risk of delegated overreach. Best practice is evolving here, and there is no universal standard for every architecture yet. One common edge case is read-only assistants that still have access to sensitive data. Even without write privileges, they can be dangerous if they ingest secrets, internal plans, or adversarial prompts that later influence other systems. Another is multi-step automation, where one agent gathers data and another takes action. That separation helps, but it only works if the boundary between the agents is enforced with distinct identities and scoped credentials, not shared tokens. The other major exception is emergency access. Break-glass paths may be necessary, but they should be time-boxed, heavily monitored, and clearly segregated from normal assistant behaviour. This is where the lessons from the DeepSeek breach are especially relevant: once an assistant can cross from analysis into execution, poor boundaries become an incident multiplier rather than a minor configuration issue. In practice, these controls fail most often in environments that mix rapid prototyping, shared service accounts, and production tool access without a separate approval layer.Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A03 | Addresses prompt/tool injection risks before agents touch real systems. |
| CSA MAESTRO | M1 | Covers identity, policy, and runtime guardrails for autonomous agents. |
| NIST AI RMF | Governance is required before deploying AI with operational authority. |
Bind each agent to scoped identity and enforce contextual authorization at runtime.
Related resources from NHI Mgmt Group
- Why is identity such a critical factor in securing AI agent systems?
- When is it appropriate to implement MCP in the context of AI systems?
- How does the rise of AI identities impact traditional IAM systems?
- How should security teams limit the risk from AI agents that have access to production systems?
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 10, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org