TL;DR: AI model monitoring has shifted from tracking prediction quality to tracking runtime behaviour, because agents can fail through tool use, looping, scope drift and policy breaches even when model metrics look healthy, according to Collibra. The governance gap is that production assurance now depends on watching actions, not just outputs.
NHIMG editorial — based on content published by Collibra: AI model and agent monitoring: Metrics, drift detection, and runtime alerting in production
Questions worth separating out
Q: How should security teams monitor AI agents in production?
A: Security teams should monitor both outcome metrics and runtime behaviour.
Q: Why do AI agents need more than model dashboards?
A: AI agents can fail through action patterns that model dashboards do not expose.
Q: What breaks when agent monitoring only tracks accuracy and latency?
A: It misses the control failures that matter most in production.
Practitioner guidance
- Baseline agent behaviour as well as model performance Capture normal tool-call patterns, step counts, context sources, and task duration at deployment so you can detect behavioural drift later.
- Wire alerts to execution owners with stop authority Make every alert carry the model or agent owner, trace context, and severity so responders can pause the agent before another task completes.
- Track policy-trigger rates and scope adherence Treat repeated policy hits, unusual tool access, and expanding context retrieval as governance signals.
What's in the full article
Collibra's full blog post covers the operational detail this post intentionally leaves for the source:
- Specific metric families for production model and agent monitoring, including performance, data, output, and behaviour signals
- Practical alerting structure for runtime intervention, including thresholding, routing, and severity tiers
- Detailed comparison of monitoring versus observability for AI systems
- Examples of how a Command Center can function as a runtime monitoring plane
👉 Read Collibra's analysis of AI model and agent monitoring in production →
AI agent monitoring: what IAM teams need to watch in production?
Explore further
Runtime monitoring is now an identity control problem, not just an AI operations problem. The article shows that once a system can act, drift becomes a governance issue because the system is exercising access in production, not merely producing predictions. That widens the control surface from model performance to action validity, which is exactly where NHI governance begins to matter. The practical conclusion is that production AI monitoring belongs inside identity and access oversight, not outside it.
A few things that frame the scale:
- 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
- In the same research, 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, showing that identity blind spots are already widespread across machine and delegated access.
A question worth separating out:
Q: Who should own AI agent runtime alerts?
A: Runtime alerts should go to the owner who can act on the behaviour, not just the team that built the model. In practice that means the system owner, identity owner, or platform operator must receive enough context to pause execution, inspect traces, and decide whether the issue is access, policy, or workflow.
👉 Read our full editorial: AI agent monitoring exposes a runtime governance gap in production