TL;DR: Attackers rapidly adapted to early agent capabilities, with system-prompt extraction, subtle safety bypasses, exploratory probing, and indirect attacks through untrusted external content emerging as the dominant patterns, according to Lakera’s 30-day Q4 2025 snapshot. The lesson is that once models read documents, browse sources, or call tools, security must shift from prompt filtering to workflow-level control.
NHIMG editorial — based on content published by Lakera: The Year of the Agent: What Recent Attacks Revealed in Q4 2025 (and What It Means for 2026)
By the numbers:
- When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.
Questions worth separating out
Q: How should security teams handle untrusted content in AI agent workflows?
A: Security teams should treat every retrieved page, file, message, or feed as untrusted until it is validated by provenance and policy checks.
Q: Why do AI agents make prompt injection harder to contain?
A: AI agents make prompt injection harder to contain because they do more than answer questions.
Q: What do security teams get wrong about system prompt leakage?
A: They often treat it as only an information disclosure issue.
Practitioner guidance
- Classify every agent input as trusted or untrusted Apply explicit trust handling to retrieved web pages, documents, email, and other external sources before they reach model context.
- Constrain tool use to narrow, auditable permissions Limit each agent to the smallest possible tool set and log every call with input provenance, output destination, and downstream effect.
- Separate instruction layers from content processing Keep policy instructions, system prompts, and task content distinct so attackers cannot exploit a single mixed context to change behaviour.
What's in the full article
Lakera’s full article covers the operational detail this post intentionally leaves for the source:
- 30-day Q4 2025 attack sampling methodology across Lakera Guard-protected systems and the Gandalf: Agent Breaker environment
- Technique-by-technique breakdown of system-prompt extraction attempts, including hypothetical scenarios and obfuscation patterns
- Examples of indirect prompt injection payloads hidden inside webpages, files, and structured content
- Observed attacker adaptation patterns across browsing, retrieval, and lightweight tool-use scenarios
👉 Read Lakera’s Q4 2025 analysis of agent attack patterns and prompt injection →
Agentic AI attack patterns in Q4 2025: what changed for teams?
Explore further
Indirect prompt injection is now an identity boundary problem, not just a content problem. Once an agent reads external content and can act on it, the question is no longer whether the text is harmful in isolation. The real issue is whether the system can distinguish trusted instructions from untrusted material at runtime. That makes tool invocation, retrieval, and rendering part of the identity control surface. Practitioners should treat external content as a delegated influence channel, not a passive input.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- 43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases, according to The State of Secrets in AppSec.
A question worth separating out:
Q: What should organisations do when agent safety checks are bypassed by role framing?
A: Organisations should move safety enforcement closer to the action path and not rely only on language-based checks. If analysis, simulation, or evaluation framing can change behaviour, then the governance model is too dependent on conversational intent and too weak at runtime authorisation.
👉 Read our full editorial: Q4 2025 agent attacks show why AI security must cover workflows