Subscribe to the Non-Human & AI Identity Journal
Home Glossary Agentic AI & Autonomous Identity Instruction drift
Agentic AI & Autonomous Identity

Instruction drift

← Back to Glossary
By NHI Mgmt Group Updated July 5, 2026 Domain: Agentic AI & Autonomous Identity

Instruction drift is the gradual change in how an agent ranks or interprets instructions after repeated interactions. It matters because the agent can begin to prefer conditioned context over original policy, creating a slow governance failure that is difficult to detect in a single review cycle.

Expanded Definition

Instruction drift is the cumulative shift in an agent’s behaviour when repeated context, tool outputs, or user prompts begin to outweigh the original operating policy. In NHI and agent governance, the concern is not a single bad prompt but a slow reordering of priorities that changes how the agent resolves conflicts, escalates requests, or selects actions.

Definitions vary across vendors because some teams describe this as prompt decay, behavioural drift, or policy erosion, but the security issue is the same: the agent increasingly treats recent interaction history as more authoritative than standing instruction. That distinction matters in systems with long-lived conversations, delegated tool access, or human-in-the-loop approvals, where the agent can become subtly easier to steer over time. The control challenge aligns closely with the NIST Cybersecurity Framework 2.0 emphasis on governance and continuous monitoring.

The most common misapplication is treating instruction drift as a prompt-writing problem, which occurs when teams assume one-time prompt hardening will prevent behaviour from changing across extended sessions.

Examples and Use Cases

Implementing instruction drift controls rigorously often introduces friction, because tighter context limits and more frequent resets can reduce convenience while improving policy fidelity.

  • An internal support agent begins prioritising the latest customer request over a standing policy to refuse secrets disclosure after many back-and-forth clarifications.
  • A code-assisting agent accepts a tool-generated instruction to broaden permissions, even though the original system policy restricted write access to specific repositories.
  • An orchestration agent repeatedly sees exception handling in a long workflow and starts treating exceptions as normal operating procedure instead of requiring approval. This pattern is especially relevant in NHI-heavy environments documented in the Ultimate Guide to NHI.
  • A delegated API agent continues using an expired operational preference because prior conversation history encourages it to retry rather than stop and escalate.
  • After a token compromise, investigators find the agent had been conditioned over time to trust a narrow set of follow-up commands, echoing the type of persistence seen in the Salesloft OAuth token breach.

Because agent behaviour is shaped by accumulated context, teams often pair drift testing with guidance from the NIST Cybersecurity Framework 2.0 to make monitoring repeatable across release cycles.

Why It Matters in NHI Security

Instruction drift creates a governance gap that is easy to miss in routine reviews because the agent may still appear functional while becoming less aligned with policy. In NHI security, that is dangerous when the agent can approve access, surface secrets, modify infrastructure, or relay instructions to other automated identities. If the drift is not detected, the agent can gradually normalise exceptions, weakening least privilege and weakening the separation between policy and execution.

The scale of the problem is amplified by the broader NHI landscape: NHI Mgmt Group reports that 97% of NHIs carry excessive privileges, which means even small interpretive shifts can have outsized blast radius. When an agent with those privileges begins privileging recent context over original governance, the result can be silent policy bypass rather than an obvious failure.

Practitioner insight: organisations typically encounter instruction drift only after an agent makes an unauthorised decision, at which point the term becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10Agentic AI guidance addresses prompt influence, tool misuse, and policy-following failures.
NIST AI RMFAI RMF covers managing reliability and governance risks from changing model behaviour.
NIST CSF 2.0GV.RM-02Risk management governance supports continuous oversight of agent behaviour changes.

Assess drift as an AI risk, document monitoring, and respond when behaviour deviates from policy.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org