TL;DR: LLMs struggle with arithmetic because they are statistical pattern systems, not precision engines, and their weakness is amplified by prompt manipulation patterns such as gradual scope expansion, flattery, and context exhaustion, according to ZioSec. The governance lesson is that controllability, auditability, and task boundaries matter more than conversational fluency when AI systems are allowed to act on behalf of users.
At a glance
What this is: This is an explanatory article about why LLMs fail at reliable math and why their conversational design makes them vulnerable to manipulation and context drift.
Why it matters: It matters because the same control assumptions that fail for weakly bounded AI behavior also affect agentic AI governance, delegated NHI access, and any identity programme that treats fluent output as evidence of trustworthy execution.
👉 Read ZioSec's analysis of why LLMs struggle with math and manipulation
Context
Large language models are not arithmetic engines. They generate likely text based on patterns, which means they can sound confident while still being unreliable at exact computation or sustained reasoning. That distinction matters for identity and access programmes because the same design choice, statistical response rather than deterministic execution, becomes a governance issue once AI is allowed to influence workflows, tools, or approvals.
The article also points to manipulation patterns that resemble social engineering, including gradual scope expansion, flattery, and context exhaustion. In IAM terms, that is a reminder that trust boundaries are not preserved by conversational polish. Once an AI system is permitted to carry state across a session, the risk is not only incorrect answers but boundary drift that can affect NHI controls and agent oversight.
Key questions
Q: How should security teams validate AI output before it affects access or workflow decisions?
A: They should require a deterministic validation step before any AI-generated output can trigger access, data movement, or workflow completion. The model can draft, recommend, or classify, but the final action needs a separate control, such as rules-based verification, policy checks, or human approval. That keeps conversational confidence from becoming operational authority.
Q: Why do LLMs become more vulnerable to manipulation as sessions get longer?
A: Because earlier instructions lose relative influence as the context window fills, so later prompts can dominate the model’s response. That creates a drift effect where repeated requests, flattery, or topic shifts can erode guardrails without any formal policy change. For practitioners, session length is part of the risk surface, not just a usability detail.
Q: What do security teams get wrong about using LLMs for exact calculations?
A: They often assume a fluent answer is a correct answer. LLMs are pattern-based systems, so they can produce plausible arithmetic while still making mistakes, especially on multi-step or large-number tasks. Teams should route precise computation to deterministic tools and use the model only for explanation, drafting, or orchestration.
Q: How can organisations reduce the risk of prompt drift in AI-assisted workflows?
A: They should define narrow task boundaries, re-check state before each critical step, and reset the conversation when the objective changes. Prompt drift is most dangerous when a model is allowed to carry assumptions across long interactions. Clear resets and explicit checkpoints reduce the chance that a benign exchange turns into an unsafe one.
Technical breakdown
Why LLMs are statistical systems, not calculators
LLMs predict the next token from learned patterns rather than executing symbolic arithmetic. A calculator holds explicit rules for addition or multiplication, while an LLM approximates the structure of a correct answer from training data and context. That is why simple sums may look correct, yet multi-step or large-number calculations can fail without external tooling. The article is right to separate language fluency from computational precision. For identity teams, the useful lesson is that apparent competence is not proof of deterministic behaviour or reliable control.
Practical implication: do not treat LLM output as validated execution unless the system is paired with deterministic tools and explicit verification.
How prompt manipulation changes model behaviour
The article describes boiling the frog, love bombing, and concept drift as ways to move a model from harmless input toward unsafe output. These are not mathematical errors, but behavioural steering techniques that exploit the model’s helpfulness and sensitivity to conversational context. The model does not ‘decide’ in a human sense, yet its output can still be shaped into progressively riskier territory by sustained prompting. For governance, the key issue is that guardrails must hold across the full interaction, not only at the first prompt.
Practical implication: review guardrail design for session-long resilience, not just first-turn safety.
What context exhaustion means for AI trust boundaries
Context exhaustion happens when earlier instructions lose influence as the conversation grows and newer tokens dominate the working window. In practice, this can weaken system prompts, safety instructions, and task boundaries over time. The article links this to decision fatigue in humans, which is a useful analogy but not the same mechanism. For AI governance, this creates a temporal trust problem: the model may be compliant early in a session and drift later without any obvious policy change. That makes session length, state retention, and reset discipline part of the control surface.
Practical implication: set hard session boundaries and re-validate state before any AI action that has identity or access consequences.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
LLM math failure is a control-design warning, not just a model limitation. The article shows that a language model can approximate answers without ever becoming a precise execution engine. That matters because security teams often overread fluent output as evidence of dependable behaviour. In governance terms, the failure is not only arithmetic error, but the assumption that useful output implies trustworthy action. Practitioners should separate conversational confidence from control assurance.
Prompt drift is a named governance concept that matters beyond chat quality. The article’s boiling the frog and concept drift examples show how benign interaction can be steered into unsafe territory through incremental boundary pressure. That is a useful mental model for NHI and autonomous oversight because it exposes how intent can change mid-session without any formal permission event. The implication is that teams need to think about boundary stability across the full interaction, not just at session start.
Context exhaustion creates a session-boundary problem that IAM teams cannot ignore. As context grows, earlier instructions weaken and later inputs dominate, which means model behaviour can change without a policy change or a new authentication event. That is a governance failure mode for any programme that assumes one approval or one prompt is enough for a whole task. The practical conclusion is that task completion cannot be treated as a single trust decision when the underlying actor is stateful.
Human analogies are useful here, but they can hide the real control gap. The article compares LLM vulnerability to decision fatigue and manipulation techniques used on people. The comparison helps explain why the model is influenceable, but it should not obscure the structural issue: LLMs are responding to patterns, not applying fixed rules. For identity leaders, the lesson is to govern the interaction boundary, not just the model’s content quality.
AI governance needs verification points, not conversational optimism. If a system can be nudged off course by repeated prompts or context accumulation, then review must happen at the point of action, not only at the point of input. That aligns with broader NHI and agent governance: access, tool use, and output must each be independently checked. The practitioner conclusion is straightforward. Fluency is never a substitute for proof of control.
From our research:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
- Only 44% of organisations have implemented any policies to govern AI agents, even though 92% agree governing them is critical to enterprise security.
- If you are mapping this risk to agent governance, start with OWASP NHI Top 10 and then compare it with your own access-review and approval boundaries.
What this signals
Context drift is now a governance issue, not only an AI quality issue. Once a model can carry state long enough for prompt pressure to accumulate, the control problem shifts from answer correctness to boundary stability. In practice, that means identity, workflow, and data teams need shared checkpoints around session resets, tool invocation, and escalation paths, not isolated model safety reviews.
The programme-level signal is clear: AI systems that look interactive but behave probabilistically should be treated like bounded non-human identities, not like trusted operators. That framing aligns with the broader NHI control stack, where access scope, session duration, and action verification matter more than the confidence level of the output.
For practitioners
- Separate reasoning from execution paths Require deterministic tools for any calculation, transformation, or policy decision that must be exact. Treat model text as a recommendation layer only, and route all high-stakes actions through verified logic or human approval.
- Set hard session boundaries for AI interactions Limit how long a model can carry state, and reset context before any step that affects access, identity, or data handling. This reduces boundary drift and makes the control point easier to audit.
- Test guardrails against gradual prompt escalation Use red-team scenarios that start with harmless requests and slowly increase pressure, flattery, or scope. Measure whether the model holds its boundaries after repeated iterations rather than only on the first prompt.
- Instrument approval checkpoints before AI-driven actions Place explicit approval or verification gates before any action that touches tools, credentials, or sensitive data. Do not let conversational persistence substitute for authorisation.
Key takeaways
- LLMs fail at math because they generate plausible text, not deterministic results, which makes them unsafe as standalone execution engines for exact tasks.
- Prompt escalation and context exhaustion show that conversational AI can drift over time, so the real governance problem is session control, not just content quality.
- Identity teams should place verification between AI output and any action that affects access, credentials, or workflow state.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A2 | Prompt drift and manipulation map directly to agentic application input abuse. |
| OWASP Non-Human Identity Top 10 | NHI-04 | LLM-driven actions need bounded identity and session controls. |
| NIST CSF 2.0 | PR.AC-4 | Access enforcement must separate model output from real authorisation. |
Test model interactions for prompt escalation and constrain tool use behind explicit policy checks.
Key terms
- Context Window: The context window is the amount of text a model can actively consider while generating a response. In practice, earlier instructions may lose influence as the window fills, which can weaken guardrails over long sessions and make state management a governance concern.
- Prompt Drift: Prompt drift is the gradual shift of a model conversation from a safe topic or bounded task toward a riskier one. It often happens through repeated nudges, topic changes, or social pressure patterns, and it matters because the model may remain compliant while silently moving outside the intended boundary.
- Deterministic Validation: Deterministic validation is a control step that checks an AI-generated output against fixed rules, calculations, or policy before action is taken. It matters because fluent language is not proof of correctness, and high-stakes workflows need a separate mechanism that can confirm the result without relying on model confidence.
Deepen your knowledge
LLM behaviour limits, boundary drift, and AI governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for AI-assisted workflows, it is worth exploring.
This post draws on content published by ZioSec: Why LLMs Struggle with Math and the limitations of AI behaviour. Read the original.
Published by the NHIMG editorial team on 2026-01-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org