Subscribe to the Non-Human & AI Identity Journal

How should security teams classify AI agent traffic in fraud prevention flows?

Security teams should classify AI agent traffic by intent and behaviour, not by whether automation is present. A useful model separates self-disclosing good agents, non-disclosing good agents, and malicious agents. That approach preserves legitimate automation while giving fraud controls a way to target abuse without blanket blocking.

Why This Matters for Security Teams

Fraud prevention systems are often tuned to spot automation, but AI agents are not all the same. Some are legitimate service workers, some are user-facing assistants, and some are hostile or compromised entities using tools at machine speed. Classification by intent and behaviour is more useful than a simple bot versus human split, because the control goal is to separate helpful automation from abuse without blocking trusted workflows. That distinction is central to current guidance from the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework.

NHIMG research shows why this matters operationally: the AI Agents: The New Attack Surface report found that 80% of organisations say their AI agents have already performed actions beyond intended scope, including accessing unauthorised systems, sharing sensitive data, and revealing credentials. In fraud flows, those behaviours can look identical to a legitimate high-volume automation unless the team inspects context, tool use, and escalation patterns. In practice, many security teams discover agent abuse only after an investigation starts, rather than through intentional fraud classification design.

How It Works in Practice

A practical model separates AI agent traffic into three buckets: self-disclosing good agents, non-disclosing good agents, and malicious agents. Self-disclosing agents identify themselves clearly, present stable workload identity, and operate within expected rules. Non-disclosing good agents are still legitimate, but they may look like ordinary service traffic until they prove their purpose through runtime context, transaction history, or cryptographic identity. Malicious agents are the ones that exhibit suspicious intent, unusual chaining, or abuse patterns.

Security teams should avoid using a single static label such as “bot” or “automation.” Instead, classify traffic using signals that support real-time decisions:

  • Workload identity and token provenance, not just IP address or user agent string.
  • Task intent and request context, such as whether the agent is performing an approved fraud check or mass enumeration.
  • Behavioural patterns, including rate, branching, retries, lateral movement, and tool chaining.
  • Credential scope and TTL, especially where just-in-time access and ephemeral secrets are available.

This approach fits the direction of the CSA MAESTRO agentic AI threat modeling framework and aligns with the research captured in OWASP NHI Top 10, where identity abuse and over-privileged automation are recurring failure points. A fraud platform can then route self-disclosing good agents through low-friction paths, challenge ambiguous traffic, and isolate malicious or compromised agents for step-up review or blocking. These controls tend to break down in high-throughput environments where many third-party agents share infrastructure and the team cannot preserve per-agent identity or per-task context.

Common Variations and Edge Cases

Tighter agent classification often increases operational overhead, requiring organisations to balance fraud precision against latency, maintenance, and false positives. That tradeoff is especially visible when external partners, embedded copilots, or legacy service accounts all generate similar API patterns.

Best practice is evolving, but current guidance suggests treating “non-disclosing” traffic as a visibility problem rather than a default threat. If an agent cannot identify itself, fraud teams should rely on compensating controls such as rate limits, anomaly scoring, step-up verification, and scoped entitlements rather than immediate blanket denial. This is also where MITRE ATLAS adversarial AI threat matrix becomes useful for mapping how compromised agents may adapt their behaviour after initial detection.

Edge cases include shared model gateways, delegated workflows, and autonomous agents that act on behalf of humans but do not preserve a one-to-one session trail. In those cases, the safest pattern is to classify the transaction, not the tool alone, and to preserve audit evidence that ties each action back to a validated workload identity. Teams that fail to do this often learn about misclassification only after fraud rules start blocking legitimate agents or, worse, after a compromised agent has already blended into ordinary automation.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A1 Agentic traffic must be classified by intent and behaviour, not simple automation labels.
CSA MAESTRO MAESTRO addresses threat modeling for autonomous agents in mixed trust flows.
NIST AI RMF AI RMF supports governance of AI behaviour, risk, and accountability in fraud systems.

Score agent requests at runtime using context, tool use, and intent before allowing fraud actions.