Runtime governance for customer-facing AI chatbots and agent drift

By NHI Mgmt Group Editorial TeamPublished 2026-05-16Domain: Agentic AI & NHIsSource: WitnessAI

TL;DR: Customer-facing AI failures across Chipotle, Air Canada, DPD, Woolworths and Amazon show the same pattern: users can steer chatbots far beyond their intended purpose when operators lack real-time visibility and enforcement, according to WitnessAI. The governance failure is structural because policy and logging describe the interaction after the fact, while live conversation control is what prevents harmful outputs from reaching customers.

At a glance

What this is: This analysis shows that customer-facing AI chatbots fail less from model defects than from runtime governance gaps that let off-purpose answers reach users.

Why it matters: It matters to IAM and security teams because the same visibility and enforcement problem will surface in human, NHI, and agentic AI programmes whenever runtime control is missing.

👉 Read WitnessAI's analysis of customer-facing AI runtime governance

Context

Customer-facing AI chatbot governance is the control problem of making sure a bot stays within its intended role while conversations are still happening. The primary gap is not whether teams can log the exchange after the fact, but whether they can see intent and stop an off-purpose response before a customer receives it.

The article’s examples, including Chipotle, Air Canada, DPD, Woolworths, and reported Amazon assistant abuse, point to the same operational failure: policy definitions exist, but runtime enforcement does not. That is a familiar identity problem for teams managing NHI, because rules that look complete on paper often fail when the interaction must be controlled in motion.

Key questions

Q: How should security teams govern customer-facing AI chatbots at runtime?

A: Security teams should place a control between the model and the user that can inspect prompts, evaluate responses, and block or route unsafe output before delivery. Policies, logs, and acceptable-use statements are necessary, but they only describe behaviour after the fact. Runtime governance is the layer that prevents scope drift from becoming a customer-facing incident.

Q: Why do customer-facing chatbots drift beyond their intended purpose?

A: They drift because a system prompt is guidance, not an enforcement mechanism, and users can steer probabilistic models with natural language. If the organisation cannot validate the response against purpose in real time, the bot will often remain helpful even when it should refuse. That is a governance failure, not just a model failure.

Q: What breaks when organisations rely only on observability for AI governance?

A: Observability breaks at the point where action is needed, because it records the event after the response has already been generated or delivered. That is useful for investigation, but it does not stop hallucinated policy, off-brand content, or prompt manipulation. The result is visibility without control.

Q: Who is accountable when a customer-facing AI gives harmful or off-topic advice?

A: The organisation deploying the assistant remains accountable, because the bot is part of its service environment and customer experience. Governance cannot be delegated to the model provider once the assistant is exposed to users. Teams need clear ownership, escalation paths, and runtime controls that make accountability operational rather than theoretical.

Technical breakdown

Runtime enforcement vs observability in customer-facing AI

Observability records what happened, but runtime enforcement decides what is allowed to happen next. In customer-facing AI, that difference matters because a harmful or off-purpose response can already be delivered by the time logs and alerts surface. Runtime controls inspect prompts and responses in context, then apply allow, warn, block, or route decisions before the exchange completes. That makes the control plane part of the interaction path, not a separate monitoring layer. The article is describing a governance problem where post-incident visibility exists, but the user-facing decision point is unprotected.

Practical implication: place a control between the model and the user that can act before output is delivered.

Why system prompts do not equal enforcement

A system prompt expresses intent, but it does not enforce it. General-purpose models can still follow conversational cues that pull them outside scope, especially when the prompt does not explicitly and consistently constrain the interaction. This is why customer-facing chatbots can drift into coding help, policy invention, or brand-unsafe responses even when they appear to have instructions. The technical failure is not the absence of rules. It is the mismatch between probabilistic generation and deterministic governance expectations. In practice, model behaviour remains permissive unless a separate runtime layer validates the response against purpose, context, and user intent.

Practical implication: treat system prompts as policy inputs, not as a control substitute.

Bidirectional inspection for prompts and responses

Bidirectional inspection is the architectural pattern that checks both inbound user requests and outbound model output. Inbound inspection is aimed at manipulation attempts, such as prompt injection or boundary probing. Outbound inspection is aimed at hallucinated policies, off-brand statements, or responses that exceed the bot’s defined purpose. The article’s examples show that both directions matter because failure can begin with the prompt or with the generated answer. A single-sided control misses half the risk surface. For customer-facing AI, the inspection layer has to understand conversational context, not just keywords or fixed signatures.

Practical implication: inspect both directions of the conversation before deciding whether the exchange can proceed.

Threat narrative

Attacker objective: The objective is to coerce the assistant into delivering out-of-scope, brand-unsafe, or misleading answers that the operator cannot stop in real time.

Entry occurs when a user begins a legitimate customer-facing conversation and then pushes the assistant beyond its intended role with natural language prompts.
Escalation occurs when the assistant accepts the off-scope request and produces a helpful but inappropriate response because no runtime control blocks the drift.
Impact occurs when the off-purpose answer reaches the customer, creating policy, brand, legal, and service accountability exposure.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Runtime governance is the missing control plane for customer-facing AI. These incidents are not evidence that language models are inherently unmanageable. They show that organisations are deploying conversational systems without a layer that can inspect and stop outputs while the interaction is still live. Policy documents and logs are necessary, but they are not enough when the customer is still waiting for an answer. The practitioner conclusion is straightforward: the control must sit in the path of execution, not beside it.

Scope drift is the named failure mode here, and it is a governance problem before it is a model problem. The Chipotle pattern is not unusual because users routinely test boundaries the moment a bot is public. What fails is the assumption that a system prompt defines the bot’s real operating envelope once customers can interact with it freely. The implication is that customer-facing AI needs runtime purpose enforcement, because declared intent and actual conversation behaviour are not the same thing.

Observability and governance alone create a false sense of control. Logging, monitoring, and acceptable-use policies are retrospective instruments. They document misuse after the response has already crossed the boundary, which means they cannot prevent public-facing harm. This is the same structural weakness identity teams see when a control is defined administratively but not enforced at the point of action. Practitioners should treat retrospective visibility as evidence collection, not as protection.

Customer-facing AI requires bidirectional trust checks, not single-sided model supervision. The inbound question can be legitimate while the outbound answer is unsafe, or the prompt can be malicious while the answer still appears plausible. That means the control model must evaluate both sides of the exchange in context. The broader identity lesson is that runtime behaviour, not declared policy, determines whether the system is trustworthy. Teams should design for enforced conversation boundaries, not assumed compliance.

Identity drift is the right named concept for assistants that stop behaving like the role they were assigned. A travel bot answering travel questions, a restaurant bot answering restaurant questions, and a support bot staying in support all sound obvious until a user can steer the interaction elsewhere. Once the bot’s purpose becomes negotiable at runtime, the organisation no longer controls the identity it exposed to the public. The practitioner conclusion is to define and enforce role boundaries as a runtime property, not a documentation exercise.

From our research:
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, with 38% reporting no or low visibility and 47% reporting only partial visibility.
That visibility gap is why lifecycle discipline matters, as described in Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs, when runtime authority must match declared access.

What this signals

Scope drift is becoming the governing concept for customer-facing AI, because the failure is not merely that the model answers badly, but that the organisation cannot enforce what the bot is for while the conversation is still live. That same pattern applies anywhere runtime identity behaviour is left to policy alone.

With only 1.5 out of 10 organisations highly confident in securing NHIs, the governance lesson is broader than chatbots. Teams that already struggle to govern machine identities will face the same pressure when customer-facing assistants, internal copilots, and agent workflows all depend on real-time control and not just post-event logs.

For teams formalising runtime controls, the most useful external baseline is NIST Cybersecurity Framework 2.0, because the problem spans govern, protect, detect, and respond functions at once. The practical question is whether the programme can stop unsafe action before it becomes customer-facing impact.

For practitioners

Define the bot’s permitted role in enforceable terms Document what the assistant may and may not do, then map those limits to runtime policy checks that can block off-scope answers before delivery. A written scope statement without enforcement is only a record of intent.
Inspect both prompt and response paths Deploy controls that evaluate incoming requests for manipulation and outgoing answers for scope drift, hallucinated policy, or brand-unsafe content. A single-sided control leaves half the interaction unprotected.
Treat launch readiness as production governance Require security review, incident playbooks, rollback steps, and executive sign-off before customer exposure. Customer-facing AI should be governed like revenue-critical infrastructure, not like a prototype.
Plan for boundary testing on day one Assume users will deliberately probe the assistant with off-topic, adversarial, or compliance-sensitive prompts immediately after launch. Build monitoring and escalation procedures around that expectation, not after a viral failure.
Separate observability from control Use logs and dashboards for investigation, but do not confuse them with the ability to prevent a harmful response. Runtime enforcement must be the control that stops the interaction, while observability documents it.

Key takeaways

Customer-facing AI incidents are usually governance failures first and model failures second.
Real-time enforcement matters more than retrospective logging when a bot can reach customers directly.
Enterprises should define scope, inspect both directions of the exchange, and treat runtime control as production infrastructure.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	LLM-06	Runtime output control addresses off-scope assistant behaviour.
NIST CSF 2.0	PR.PT	Protective technology must intervene during live AI interactions.
NIST Zero Trust (SP 800-207)	PR.AC-4	Continuous verification fits runtime access and purpose checks for AI interactions.

Inspect prompts and outputs in real time, then block or route responses that exceed the bot's intended role.

Key terms

Runtime enforcement: Runtime enforcement is the control layer that inspects an AI interaction while it is happening and decides whether the output can proceed. It is distinct from logging or policy documentation because it acts before the user receives the answer, which is essential when a model can be manipulated in conversation.
Scope drift: Scope drift is the failure mode where an assistant begins behaving outside the task or role it was assigned. In customer-facing AI, that can mean answering unrelated questions, inventing policy, or producing brand-unsafe content, which creates governance and accountability risk even when the underlying model is functioning normally.
Bidirectional inspection: Bidirectional inspection is the practice of evaluating both the incoming prompt and the outgoing response in the same interaction flow. It matters because threats can enter through manipulation on the way in or through unsafe content on the way out, and either direction can create a public incident.
Identity drift: Identity drift is the condition where an exposed AI assistant no longer behaves like the business role it was meant to represent. For customer-facing systems, the issue is not just correctness but whether the system still acts within the boundaries that define its operational identity.

Deepen your knowledge

Customer-facing AI runtime governance is covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for public-facing assistants or agent workflows, the course provides a useful governance foundation.

This post draws on content published by WitnessAI: customer-facing AI runtime governance and the Chipotle chatbot failure pattern. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-16.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org