How should security teams detect risky AI agent behaviour in production?

Why This Matters for Security Teams

Risky agent behaviour is not the same as a stolen password or a failed login. An AI agent can stay authenticated and still become dangerous by chaining tools, widening scope, or pursuing an objective outside its approved purpose. That makes runtime behaviour the real security signal. NIST’s NIST AI Risk Management Framework and OWASP’s OWASP Agentic AI Top 10 both point security teams toward observing what systems do, not just what they are allowed to access.

This matters because agent activity is often distributed across APIs, ticketing systems, code repositories, cloud consoles, and message queues. If monitoring only checks login success or token validity, it misses lateral tool use, abnormal request sequencing, and action bursts that indicate an agent has drifted from its intended use case. NHI Management Group’s guidance on the OWASP NHI Top 10 frames this as a behavioural-envelope problem, not a perimeter problem. In practice, many security teams encounter suspicious agent activity only after an upstream secret or connector has already been abused, rather than through intentional runtime detection.

How It Works in Practice

Effective detection starts by defining the agent’s approved behavioural envelope: the systems it may touch, the tools it may invoke, the sequence patterns that are normal, and the conditions that should trigger review or stop action. Security teams should instrument the agent’s decision path at runtime, including tool selection, prompt-to-action transitions, privilege requests, and cross-system hops. That telemetry is more useful than generic alerting because it shows whether the agent is still executing its intended task.

Current best practice is evolving toward policy and detection together. Use policy-as-code to enforce the allowed context at request time, then watch for deviations in the resulting actions. The CSA MAESTRO agentic AI threat modeling framework and the NIST Cybersecurity Framework 2.0 both support this operational view: define expected behaviour, monitor for drift, and respond when an agent starts behaving outside that model.

Baseline normal tool chains for each agent, then alert on unusual order, frequency, or destination.

Correlate requests across identity, API, and workload telemetry to spot privilege escalation through chained actions.

Flag repeated retries, rapid branching, or new system targets as indicators of emergent or malicious behaviour.

Use step-up controls or task termination when the agent requests access beyond its approved purpose.

The most useful signal often comes from joining behavioural telemetry with identity context, as highlighted in NHI Management Group’s The State of Non-Human Identity Security research. That report also notes that inadequate monitoring and logging is cited by 37% of organisations as a cause of NHI-related attacks. These controls tend to break down when agents operate across loosely coupled SaaS platforms because each platform sees only a fragment of the full action chain.

Common Variations and Edge Cases

Tighter runtime detection often increases operational noise, requiring organisations to balance behavioural visibility against analyst workload and false positives. That tradeoff is especially sharp for agents that legitimately explore multiple paths, such as research assistants, code-fixing agents, or workflow orchestrators with adaptive planning.

There is no universal standard for how much deviation is acceptable yet. Some teams set strict sequence rules, while others rely on anomaly scoring against a learned baseline. The right approach depends on whether the agent has deterministic tasks or open-ended goals. For highly autonomous systems, the safest pattern is to combine anomaly detection with hard guardrails, short-lived authorisation, and explicit task boundaries.

Edge cases include batch agents that run on a schedule, agents that hand off work to other agents, and systems that use shared service accounts. Those environments can look suspicious even when they are functioning correctly, so the detection model should include ownership, task context, and intended handoffs, not just identity. NHI Management Group’s AI LLM hijack breach coverage and the LLMjacking: How Attackers Hijack AI Using Compromised NHIs research both show why compromised credentials and abnormal execution paths must be detected together.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Covers unsafe agent behaviour and tool misuse that runtime detection must catch.
CSA MAESTRO	T1	Addresses threat modelling and runtime control of autonomous agent workflows.
NIST AI RMF	GOVERN	Requires accountability and oversight for AI system behaviour in production.

Assign owners, define behavioural limits, and review agent drift as an AI governance control.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams detect risky AI agent behaviour in production?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group