TL;DR: Prompt engineering now spans formatting, role assignment, reasoning scaffolds, and adversarial exploits, with Lakera arguing that clear structure and context matter more than clever wording and that guardrails can be bypassed by reframing questions. The security implication is that prompt quality is now a governance issue, not just a usability trick.
At a glance
What this is: This is Lakera’s guide to prompt engineering in 2026, showing that prompt structure, context, and adversarial testing now matter as much as output quality.
Why it matters: It matters because teams building AI assistants and agentic workflows need governance patterns that address prompt abuse, guardrails, and execution risk across both machine and human-operated programmes.
By the numbers:
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
👉 Read Lakera's guide to prompt engineering and prompt injection risks
Context
Prompt engineering is the practice of shaping model instructions so a large language model returns the right format, tone, and answer quality. The security gap is that the same instruction layer can also be abused to bypass guardrails, extract restricted output, or steer a model into unsafe behaviour.
For identity and AI governance teams, the relevant question is not whether prompts are clever, but whether the surrounding control plane can constrain what the system can do when instructions are reframed. That makes prompt design, output filtering, and adversarial testing part of the same operating model.
The article treats this as a practical discipline rather than a theory exercise, and that is the right starting point for enterprise use. Once prompts influence real workflows, they become part of the control surface for AI systems, not just a developer convenience.
Key questions
Q: How should security teams implement prompt engineering for production AI systems?
A: Treat prompt engineering as a controlled part of the application design, not an informal writing exercise. Use standard templates, explicit output formats, and validation layers so prompts improve consistency without becoming the only safeguard. Production systems should assume prompts can be manipulated, so policy enforcement and access control must sit outside the model prompt itself.
Q: Why do prompt-based guardrails fail in real-world AI applications?
A: They fail because prompts are interpreted, not enforced, and attackers can often reframe a request until the model treats it as legitimate. That means roleplay, translation, and partial extraction can bypass weak guardrails. Teams should assume that any boundary embedded only in natural language can be probed and manipulated.
Q: How do you know if prompt engineering is actually improving AI safety?
A: Look for fewer malformed outputs, fewer policy violations, and less variation across repeated runs of the same task. If the system still produces unsafe responses when instructions are slightly reframed, the prompt is helping usability but not providing reliable safety. Measurement should include adversarial testing, not just user satisfaction.
Q: What should teams do when prompt injection affects connected tools?
A: They should isolate the tool layer from direct model authority, then validate every action before execution. Once a model can retrieve data or call APIs, prompt injection is no longer a text problem alone. The control point moves to authorization, approval, and downstream policy checks before any action is taken.
Technical breakdown
Prompt structure and instruction hierarchy
Prompt engineering works because LLMs are highly sensitive to instruction order, context, and formatting cues. A model usually responds better when the task, constraints, examples, and output format are made explicit, because the prompt defines the working boundary for interpretation. In practice, prompts are not only requests but also lightweight policy statements. That is why vague instructions produce inconsistent outputs, while layered prompts can reliably shape classification, summarisation, or structured generation across different models.
Practical implication: standardise prompt templates for repeatable business tasks and treat prompt structure as part of AI control design.
Reasoning scaffolds and output constraints
Reasoning scaffolds, role assignments, and format constraints help a model stay inside a chosen task boundary. They can improve consistency, but they do not create hard security controls because the model still interprets the prompt rather than enforcing policy in the way an access engine would. That distinction matters. A prompt can influence how a system behaves, but it cannot guarantee that the system will refuse unsafe requests when the instruction set is manipulated or when the surrounding application leaks context.
Practical implication: pair prompt scaffolding with downstream validation, policy checks, and explicit output filtering.
Adversarial prompts and guardrail bypass
Adversarial prompting exploits the fact that models often treat reframed requests as legitimate variations of the same task. Attackers can use roleplay, translation, partial extraction, or hypothetical framing to cross a guardrail without changing the underlying intent. This is why prompt injection and jailbreaking are control-plane issues, not just model-quality issues. The security problem is not merely that the model answers poorly. It is that the model may follow instructions that were never meant to be authoritative in the first place.
Practical implication: test guardrails with adversarial prompt sets and assume that instruction framing can be manipulated.
Threat narrative
Attacker objective: The attacker wants the model to disclose protected information or behave outside its intended safety boundaries.
- Entry occurs when a user submits a benign-looking prompt that is later reframed into a malicious instruction path.
- Escalation happens when the model accepts roleplay, translation, or partial extraction as a valid way to continue the exchange.
- Impact follows when the model reveals restricted content, bypasses guardrails, or produces unsafe actions inside a connected workflow.
Breaches seen in the wild
- DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
- JetBrains GitHub plugin token exposure — CVE-2024-37051 in JetBrains IntelliJ GitHub plugin exposed GitHub access tokens.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Prompt engineering is now a control problem, not a writing trick. The article correctly shows that instruction quality shapes model behaviour, but the deeper issue is governance. Once prompts determine tone, format, and action selection, they become part of the AI control surface. That means AI teams need to assess prompt design as an operational boundary, not a style exercise.
Clear structure reduces ambiguity, but it does not create trust. The model still interprets the instruction rather than enforcing it, which is why prompt quality can improve reliability without solving abuse. In security terms, this is a soft control that can fail under adversarial framing. Practitioners should treat prompt scaffolds as helpful, but never as the only barrier between user intent and model output.
Prompt injection exposes a wider execution-layer gap. The article’s real lesson is that once a model can retrieve data or invoke tools, a bad prompt can become an action path rather than a text issue. That is where AI governance starts to overlap with identity, access, and policy enforcement. Teams should recognise that model input is now part of enterprise attack surface management.
Named concept: instruction boundary drift. Prompt-based systems fail when the line between task instruction and adversarial instruction becomes too thin to defend consistently. That assumption was designed for relatively stable, human-paced prompting. It fails when attackers can continuously reframe requests until the model accepts them, which means practitioners must rethink how authority is assigned inside AI workflows.
Security teams should stop treating prompt engineering as isolated from broader AI risk management. The same workflow that improves output quality can also become the channel through which unsafe data access or policy bypass occurs. The field now needs governance that spans prompt design, retrieval, tools, and output validation. Practitioners should align prompt practice with formal AI control frameworks, not ad hoc experimentation.
From our research:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
- For teams building AI workflows, the next step is to separate prompt quality from identity and secrets governance by reviewing the NHI Lifecycle Management Guide alongside prompt controls.
What this signals
Prompt engineering is becoming part of the governance stack for AI-enabled workflows, which means teams need to think about instruction design and access control together. The same control environment that tolerates weak secrets hygiene will struggle even more when prompts can be reframed into unsafe execution paths.
Instruction boundary drift: when the same natural-language layer is used for tasking, safety, and tool access, the boundary between helpful and harmful intent becomes unstable. That instability makes prompt review, red-teaming, and downstream policy enforcement operationally important rather than optional.
With 44% of developers following secrets best practices, the broader lesson is that human behaviour remains a weak link even in technically sophisticated AI programmes, according to The State of Secrets in AppSec. Teams should expect prompt misuse and secret exposure to appear together unless governance is designed across both layers.
For practitioners
- Standardise prompt templates for production use Define approved structures for role, task, examples, and output format so teams do not improvise prompts for critical workflows.
- Test guardrails with adversarial prompt suites Use translation, roleplay, partial extraction, and reframing tests to find where the model accepts unsafe instructions.
- Separate prompt quality from access control Ensure prompt design, retrieval permissions, and tool authorization are governed independently so a good prompt cannot widen access on its own.
- Add output validation before downstream actions Check structure, sensitivity, and policy compliance before model output is passed to systems that can send messages, write records, or trigger tools.
- Review connected workflows for prompt injection risk Map where user input reaches retrieval, memory, or tool execution so the prompt layer is not treated as a safe boundary.
Key takeaways
- Prompt engineering now influences security outcomes as much as usability outcomes, because the instruction layer can shape both safe and unsafe model behaviour.
- Adversarial reframing is the core failure mode, since models often treat manipulated prompts as legitimate variations of the original task.
- Practitioners should govern prompts, retrieval, tool access, and output validation as one control plane rather than as separate design problems.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Prompt injection and instruction abuse are central to the article's risk model. |
| NIST AI RMF | The article links prompt behaviour to AI risk governance and operational controls. | |
| NIST CSF 2.0 | PR.AA-01 | The piece focuses on controlling how AI systems accept and act on instructions. |
Document prompt governance, testing, and accountability under AI risk management processes.
Key terms
- Prompt Engineering: Prompt engineering is the practice of shaping the input to a language model so the output is more useful, consistent, and safe. In enterprise settings it is a control technique as much as a writing technique, because the prompt can influence structure, tone, and whether the model stays inside policy boundaries.
- Prompt Injection: Prompt injection is an attack where malicious instructions are inserted into a model interaction to override the intended task. The attacker may use reframing, embedded text, or indirect commands to steer the model into revealing information, ignoring rules, or taking unsafe actions in a connected workflow.
- Guardrail: A guardrail is any constraint intended to keep an AI system from producing unsafe or out-of-scope behaviour. It can include content filters, policy rules, approval steps, or output validation, but it is only effective when the surrounding application also controls what the model can access and execute.
- Instruction Boundary: An instruction boundary is the line between what the model should treat as task context and what should be treated as untrusted input. When that boundary is weak or unclear, attackers can reframe normal prompts into commands that the model follows, which is why boundary design matters in AI governance.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- Concrete examples of prompt patterns that improve consistency across different model families
- Step-by-step comparisons of role-based, few-shot, and anchored prompting strategies
- Red-team style examples showing how adversaries reframe prompts to bypass guardrails
- Practical guidance on combining prompt design with output moderation and model evaluation
👉 Lakera's full article includes the examples, attack patterns, and prompting techniques in detail
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
Published by the NHIMG editorial team on 2026-04-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org