Subscribe to the Non-Human & AI Identity Journal

What should organisations measure to detect drift in agent behaviour?

Organisations should measure whether the agent’s actions still match the user’s goal, the intended workflow, and the normal pattern for that agent or role. Changes in action sequence, tool use, or side effects are stronger governance signals than prompt content alone. That is how behavioural drift becomes visible.

Why This Matters for Security Teams

Behavioural drift in an agent is not a cosmetic issue. It is usually the first sign that the agent is no longer acting within the intended operational envelope, even if the prompt still looks normal. For security teams, the risk is that an autonomous system can change tool usage, sequence steps differently, or produce broader side effects without a corresponding policy change. That makes drift a governance problem, not just a model-quality problem.

Current guidance suggests measuring the output of the agent as an operating pattern, not only its text. That means comparing what it does against the intended workflow, the expected tool chain, and the scope of action approved for that role. This is consistent with the direction of the OWASP Agentic AI Top 10 and NHI governance work from Ultimate Guide to NHIs, which both emphasise visibility into how non-human actors actually behave over time.

NHI Mgmt Group notes that only 5.7% of organisations have full visibility into their service accounts, which is a useful warning sign for agent oversight as well, because drift is hard to detect when the baseline is incomplete. In practice, many security teams encounter harmful agent drift only after a tool chain has already expanded, rather than through intentional monitoring of behaviour.

How It Works in Practice

Drift detection works best when organisations define a behavioural baseline for each agent, then measure deviation from that baseline at runtime. The baseline should describe the approved task, the expected sequence of actions, the normal tool set, the frequency of human escalation, and the typical side effects. For agentic systems, this is more useful than inspecting prompt text alone, because the same prompt can lead to very different outcomes depending on context, retrieved data, tool availability, or prior memory.

Security teams should instrument several signals together:

  • Action sequence drift, such as the agent calling tools in a new order or skipping expected checkpoints.
  • Tool-use drift, such as access to APIs, repositories, or admin functions outside the usual pattern.
  • Scope drift, such as wider data reads, more records touched, or longer execution chains.
  • Side-effect drift, such as unexpected tickets, messages, commits, or configuration changes.
  • Authority drift, such as the agent requesting or using privileges it rarely needs.

This approach aligns with the runtime evaluation model described in the NIST AI Risk Management Framework, where ongoing measurement and governance matter as much as initial design. It also fits the NHI lifecycle view in NHI Lifecycle Management Guide, because the identity must remain observable across its full operational life. Practical teams usually compare current behaviour against a per-agent or per-role baseline, then alert when deviations exceed a tolerance threshold that has been approved by the business owner and security function.

That baseline should be versioned. If the workflow changes, the benchmark must change too, otherwise healthy operational evolution will look like attack activity. These controls tend to break down in highly dynamic environments such as broad tool marketplaces or open-ended copilots because the agent’s legitimate action space expands faster than the monitoring model can be updated.

Common Variations and Edge Cases

Tighter behavioural measurement often increases operational overhead, requiring organisations to balance detection depth against alert fatigue and engineering cost. That tradeoff is real, especially where agents support multiple business functions or where tool access is frequently reconfigured. Best practice is evolving, but there is no universal standard for this yet, so teams should start with the highest-risk workflows first.

One common edge case is when an agent behaves differently because the task itself is legitimately variable. In that situation, measuring only exact sequence matches creates false positives. A better method is to score deviation by intent, data scope, and consequence. Another edge case is multi-agent systems, where one agent’s output becomes another agent’s input. Drift can then propagate through the chain, so monitoring should cover both the local action and the downstream effect. The patterns seen in AI LLM hijack breach and the OWASP NHI Top 10 show why behavioural visibility matters when autonomous systems can chain decisions faster than humans can review them.

For high-assurance environments, the drift signal should be paired with policy enforcement, not treated as a standalone detector. When the agent’s behaviour moves outside the approved envelope, the safest response is to reduce privilege, pause execution, or require explicit re-approval. Organisations that treat drift as a post-incident reporting metric usually discover the problem only after the agent has already crossed a trust boundary.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A2 Measures whether agent actions diverge from intended behaviour and tool scope.
CSA MAESTRO GOV-2 Governance needs continuous observation of autonomous agent behaviour over time.
NIST AI RMF AI RMF supports ongoing measurement and monitoring of AI system behaviour.

Baseline agent workflows and alert on action, tool, or side-effect drift from approved patterns.