TL;DR: LLMs struggle with arithmetic because they are statistical pattern systems, not precision engines, and their weakness is amplified by prompt manipulation patterns such as gradual scope expansion, flattery, and context exhaustion, according to ZioSec. The governance lesson is that controllability, auditability, and task boundaries matter more than conversational fluency when AI systems are allowed to act on behalf of users.
NHIMG editorial — based on content published by ZioSec: Why LLMs Struggle with Math and the limitations of AI behaviour
Questions worth separating out
Q: How should security teams validate AI output before it affects access or workflow decisions?
A: They should require a deterministic validation step before any AI-generated output can trigger access, data movement, or workflow completion.
Q: Why do LLMs become more vulnerable to manipulation as sessions get longer?
A: Because earlier instructions lose relative influence as the context window fills, so later prompts can dominate the model’s response.
Q: What do security teams get wrong about using LLMs for exact calculations?
A: They often assume a fluent answer is a correct answer.
Practitioner guidance
- Separate reasoning from execution paths Require deterministic tools for any calculation, transformation, or policy decision that must be exact.
- Set hard session boundaries for AI interactions Limit how long a model can carry state, and reset context before any step that affects access, identity, or data handling.
- Test guardrails against gradual prompt escalation Use red-team scenarios that start with harmless requests and slowly increase pressure, flattery, or scope.
What's in the full article
ZioSec's full article covers the explanatory detail this post intentionally leaves at the analytical level:
- The article’s analogy-driven walkthrough of why statistical models struggle with arithmetic precision
- Examples of prompt manipulation patterns such as boiling the frog, love bombing, and concept drift
- The discussion of decision fatigue and context exhaustion as a way to understand why guardrails erode over long sessions
- The source article’s broader commentary on how LLM limitations affect practical AI behaviour
👉 Read ZioSec's analysis of why LLMs struggle with math and manipulation →
LLM math limits and context drift: what IAM teams should watch?
Explore further