TL;DR: Enterprises are already seeing prohibited genAI use, shadow AI, and agent activity that legacy DLP, CASB, SIEM, and firewall controls cannot fully inspect, according to WitnessAI and cited industry research. The governance gap is widening because security teams need prompt, response, data-flow, and agent-action visibility before AI adoption outpaces review cycles.
At a glance
What this is: AI observability is a governance and security approach for seeing how employees and agents use AI, what data enters those systems, and what actions or outputs follow.
Why it matters: It matters because IAM, GRC, and security teams need runtime visibility and control across NHI, autonomous, and human workflows when AI activity bypasses traditional monitoring.
By the numbers:
- 69% of organizations already suspect or have confirmed evidence of employees using prohibited generative AI tools, according to a Gartner GenAI blind spots survey of 302 cybersecurity leaders.
- 63% of breached organizations either lack an AI governance policy or are still developing one, according to the Cost of Data Breach report.
👉 Read WitnessAI's analysis of enterprise AI observability and governance
Context
AI observability is the discipline of seeing how AI systems interact with enterprise data, users, and infrastructure so security teams can govern what happens in real time. The primary issue is not model performance. It is that employees and agents can move sensitive information and take actions through conversational interfaces that older monitoring stacks were never built to inspect.
That gap now spans NHI, autonomous workflows, and human use of AI tools. Security and GRC teams need discovery, content inspection, policy enforcement, and audit trails that cover sanctioned and unsanctioned AI activity, because browser-only and file-based controls miss too much of the actual risk surface.
For practitioners, the challenge is to make AI usage governable without pretending it behaves like a conventional application or endpoint. That means treating prompts, responses, agent actions, and data flows as security telemetry, not just productivity signals.
Key questions
Q: How should security teams govern AI use that happens outside approved tools?
A: Start with discovery, not enforcement. Teams need to identify sanctioned and shadow AI tools, then inspect prompts, responses, and agent actions so policy can be applied in context. If the programme only covers approved web apps, it will miss native copilots, IDE assistants, and agent API calls that often carry the highest governance risk.
Q: Why do legacy DLP and CASB controls fall short for AI observability?
A: They were built for structured files, email, and application access, not conversational prompts or model outputs. DLP misses freeform text entered into chat interfaces, and CASB often loses visibility once a service is sanctioned. AI observability needs content-level inspection, agent tracking, and policy enforcement at runtime.
Q: How do organisations know if AI observability is actually working?
A: They should be able to answer three questions quickly: what AI tools are in use, what data entered those systems, and what actions or outputs followed. If an investigation still depends on user confession or scattered manual logs, the observability layer is not yet providing usable governance evidence.
Q: How should teams handle autonomous agents that can take actions without human review?
A: Treat those agents as governed identities, not just models. Teams need ownership, policy, action logging, and pre-execution guardrails that cover tool use and downstream effects. Without that, the agent becomes an unaccountable actor that can move from analysis to action faster than a human review process can respond.
Technical breakdown
Bidirectional prompt and response inspection
AI observability starts by capturing both directions of the conversation. Prompt inspection shows what entered the model, while response inspection shows what came back and whether the output contains sensitive or unsafe content. That matters because risk can be introduced by the user, the model, or the interaction between them. In enterprise settings, this is closer to content-aware security than classic application logging. It gives investigators context for policy decisions, compliance evidence, and incident reconstruction.
Practical implication: inspect both prompts and outputs so policy decisions are based on the full AI interaction, not just user activity.
Shadow AI discovery and agent action tracking
Discovery has to find AI tools and agents that were never formally approved. Shadow AI discovery identifies unsanctioned chat tools, embedded copilots, IDE extensions, and agent frameworks, while agent action tracking follows tool calls, API requests, and MCP connections. This is a governance problem as much as a detection problem because an agent can act across systems even when no one has registered it in the inventory. Traditional monitoring often stops at the session or network record, leaving the agent's actual actions uncorrelated.
Practical implication: build discovery around the AI estate, then correlate every agent action back to an owner and business purpose.
Intent-based classification and risk-tiered policy
Keyword matching is too blunt for AI governance because the same prompt can be benign in one context and risky in another. Intent-based classification uses user identity, business context, and prompt content to decide whether to allow, warn, block, route, or tokenize. That moves AI security beyond binary controls. It also creates a policy layer that can support real-time governance, where sensitive queries are redirected to approved models and high-risk content is reduced before it reaches a third-party system.
Practical implication: classify AI requests by intent and business context, then enforce tiered responses instead of relying on block-or-allow rules.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
AI observability is the missing governance layer between discovery and enforcement. Security teams cannot govern what they cannot see, and AI usage now spans browser chats, embedded copilots, IDE assistants, and autonomous agents. The control gap is not just visibility into the application layer. It is the absence of runtime context for prompts, outputs, and actions. Practitioners should treat observability as the control plane that makes policy enforceable across AI use cases.
Shadow AI discovery is now a lifecycle problem, not just a monitoring problem. Unsanctioned AI tools appear because employees choose the fastest path to finish work, then the usage spreads before security notices. That pattern applies across human identity and NHI-adjacent workflows because the real issue is unmanaged access to AI systems, not a single tool instance. The implication is that inventory and governance have to move together, or the programme will always lag the actual estate.
Agent action tracking changes the accountability model for non-human work. When an agent makes tool calls or initiates actions across systems, the relevant question is no longer only who used the model, but which identity, workflow, and business purpose authorized the action. That widens the governance frame from session inspection to lifecycle accountability. Practitioners should expect audit requirements to focus on provenance, not just content review.
Prompt-and-response security creates an identity blast radius concept for AI usage. The most useful named concept here is the AI observability gap: the distance between AI activity and the security team's ability to explain it after the fact. When prompts, responses, and tool actions are not tied to policy and inventory, the blast radius grows from a single chat session into downstream workflows, records, and decisions. Teams should measure that gap as a governance risk, not a logging inconvenience.
From our research:
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
- That visibility gap is split between 38% with no or low visibility and 47% with only partial visibility, which is why runtime AI discovery matters for governance.
- For a broader lifecycle lens, see Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs for how inventory, ownership, and offboarding fit together.
What this signals
AI observability will become a required control layer wherever AI touches sensitive data or regulated workflows. The practical shift is from hoping approved tools are enough to proving that every AI interaction is visible, classifiable, and governable. That is especially true when prompts, outputs, and agent actions can no longer be separated from business process evidence.
With 85% of organisations lacking full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security, the same visibility problem is already familiar in adjacent identity domains. AI security programmes will inherit that failure mode unless discovery and governance are designed together.
Practitioners should expect the next phase of AI governance to look more like identity and access control than model tuning. The programmes that succeed will be the ones that can tie AI activity to ownership, intent, and auditable action before exposure becomes a board-level incident.
For practitioners
- Build a complete AI inventory Catalog sanctioned and unsanctioned AI tools, embedded copilots, IDE extensions, agent frameworks, and MCP-connected systems before you try to enforce policy.
- Inspect prompts, outputs, and tool calls Capture bidirectional content plus agent actions so security teams can reconstruct what entered the model, what it returned, and what it triggered next.
- Classify requests by intent and business context Use user role, purpose, and data sensitivity to distinguish low-risk productivity use from requests that need routing, tokenization, warning, or blocking.
- Extend governance beyond browser sessions Include native applications, IDE copilots, and agent API calls in scope so browser-only controls do not leave major AI activity invisible.
Key takeaways
- AI observability closes the gap between AI usage and AI governance by making prompts, outputs, and agent actions visible to security teams.
- Legacy DLP, CASB, SIEM, and firewall controls do not reliably cover conversational AI or autonomous workflows, so blind spots remain unless observability extends to the runtime layer.
- Practitioners should pair discovery with intent-based policy and audit trails so AI activity can be governed in context rather than merely recorded.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Covers prompt injection, excessive agency, and agent action risks in AI workflows. | |
| NIST AI RMF | GOV-1.6 | Inventory and governance mechanisms are central to AI observability. |
| NIST Zero Trust (SP 800-207) | PR.AC-4 | Continuous verification and least privilege support runtime AI policy enforcement. |
Map agent observability to runtime guardrails and inspect tool use before autonomous actions execute.
Key terms
- AI observability: AI observability is the ability to see how AI systems are being used, what information they process, and what actions they trigger. In security programmes, it extends beyond uptime or model quality to runtime visibility, policy enforcement, and audit evidence across human and agent-driven use cases.
- Shadow AI: Shadow AI is the use of AI tools, assistants, or agents that security and governance teams have not approved or inventoried. It creates an identity and data-governance problem because activity can spread through the enterprise before ownership, policy, and monitoring are established.
- Intent-based classification: Intent-based classification is a policy method that evaluates AI usage by user role, purpose, and data context rather than by keywords alone. It lets security teams distinguish between low-risk work and interactions that need warning, routing, tokenization, or blocking, which is essential for real-time AI governance.
- Agent action tracking: Agent action tracking is the monitoring of the tool calls, API requests, and downstream operations performed by autonomous AI systems. It matters because the security risk often emerges after the prompt is answered, when the agent starts influencing other systems or moving data across workflows.
Deepen your knowledge
AI observability and runtime governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your programme needs to cover shadow AI, agent actions, and runtime policy, it is worth exploring.
This post draws on content published by WitnessAI: AI observability is becoming the control plane for enterprise AI security. Read the original.
Published by the NHIMG editorial team on 2026-06-07.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org