Subscribe to the Non-Human & AI Identity Journal

What do security teams get wrong about logging agent activity?

Teams often assume that detailed logs equal control. In reality, logs only prove what happened, not whether the action should have been allowed. For agentic workflows, the important question is whether policy evaluated the request before execution and whether the control point could narrow or stop the action in real time.

Why Security Teams Misread Agent Logs

Security teams often treat logging as evidence of control, but for autonomous agents the real security boundary is the decision point before execution. A log can show a tool call, a prompt, or a token use after the fact, yet it does not prove the action was authorised, bounded, or safe in context. That gap matters because agents can chain actions quickly and unpredictably, especially when they hold standing credentials or broad API access.

This is why NHI governance has to be evaluated alongside agent behaviour, not after it. NHIMG’s The State of Non-Human Identity Security notes that inadequate monitoring and logging is cited as a cause of NHI-related attacks by 37% of organisations, but monitoring alone still does not stop misuse. The same pattern appears in the OWASP NHI Top 10 and the external OWASP Agentic AI Top 10, which both emphasise runtime control, not just observation. In practice, many security teams discover that their logs are complete only after an agent has already overreached, not when the policy should have stopped it.

What Good Logging Looks Like in an Agentic Workflow

Useful logging for agents is not a giant event stream. It is a structured record of intent, policy decision, execution, and outcome. Security teams should be able to answer four questions at runtime: what the agent tried to do, what context it had, whether policy approved it, and what credentials or tool scopes were used. That means logs should include request context, policy verdicts, identity assertions, and revocation events, not just raw prompts and API responses.

Best practice is evolving toward a combination of workload identity, short-lived credentials, and policy-as-code. For example, a task-specific agent session can use a cryptographic workload identity, such as SPIFFE or OIDC-backed tokens, then request just-in-time secrets for one action window. Policy engines can evaluate whether the requested tool, dataset, and destination are allowed at that moment. This approach aligns with the Ultimate Guide to Non-Human Identities and with the external NIST AI Risk Management Framework, which stresses governance, measurement, and operational controls around AI systems.

  • Log the policy decision, not just the action.
  • Bind each tool call to a workload identity and session context.
  • Use short TTL secrets and record when they are issued and revoked.
  • Capture denied requests as carefully as allowed ones, because denials reveal attempted misuse.

These controls tend to break down when agents inherit broad service-account privileges in CI/CD pipelines, because the logs then describe privilege use after the fact rather than enforce a meaningful runtime boundary.

Where Logging Falls Short and What Teams Miss

Tighter logging often increases storage, parsing, and response overhead, so organisations have to balance forensic value against operational noise. That tradeoff becomes especially visible in multi-agent systems, where one agent may delegate to another, retry failed steps, or trigger chained tool calls across several platforms. In those environments, a single log line rarely explains the full decision path.

Current guidance suggests treating logs as one control signal inside a broader runtime governance model. This is where security teams often miss the edge cases: long-lived secrets, poorly scoped service accounts, and third-party integrations can produce a misleading sense of traceability. The Moltbook AI agent keys breach shows how exposed agent credentials turn visibility into post-incident evidence rather than prevention, and the external CSA MAESTRO agentic AI threat modeling framework reinforces that autonomous behaviour needs explicit threat modeling at the workflow level.

There is no universal standard for this yet, but the practical direction is clear: log enough to reconstruct intent, enforce policy before execution, and revoke access as soon as the task ends. Anything less often leaves teams with excellent evidence and weak prevention.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A3 Agent logs must reflect runtime policy checks, not just recorded actions.
CSA MAESTRO M4 MAESTRO focuses on governance and runtime controls for autonomous agent workflows.
NIST AI RMF GOVERN AI RMF governance requires accountability for AI system behaviour and oversight.

Instrument agent workflows so each tool call is tied to identity, context, and an enforceable policy decision.