TL;DR: Static prompt testing misses how LLMs behave under adversarial pressure, and AI Model Risk Index scores models across direct and indirect attacks in realistic deployment scenarios, according to Lakera. The practical signal is that GenAI governance now needs measurable behavior under attack, not just policy or content filters.
NHIMG editorial — based on content published by Lakera: Measuring What Matters: How the Lakera AI Model Risk Index Redefines GenAI Security
Questions worth separating out
Q: How should security teams evaluate GenAI models before production?
A: Security teams should test models with realistic adversarial scenarios, including direct prompt attacks and indirect instruction injection through retrieved content.
Q: Why do static prompt benchmarks fail for enterprise LLM governance?
A: Static prompt benchmarks usually measure responses to a fixed set of inputs, which misses how attackers exploit context, hidden instructions, and workflow-specific assumptions.
Q: What should teams do about indirect prompt injection in RAG systems?
A: Teams should treat retrieved content as untrusted until it is filtered, validated, or constrained.
Practitioner guidance
- Adopt adversarial model testing Evaluate LLMs with direct prompt abuse, indirect instruction injection, and task-specific guardrails before production approval.
- Separate controls for retrieved content and user prompts Treat documents, tickets, search results, and other processed content as a distinct trust domain.
- Tie approvals to measurable risk scores Require a repeatable risk score or equivalent evidence package for model selection, exception handling, and periodic reassessment.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- The benchmark categories and scoring logic used to compare model behavior across attack types
- Examples of how direct and indirect attacks produce different failure patterns in practice
- The applied use cases behind the evaluation, including RAG and code generation contexts
- How security teams can use the index to support deployment decisions and governance reviews
👉 Read Lakera's analysis of the AI Model Risk Index for GenAI security →
AI model risk scoring: what it means for GenAI governance?
Explore further
Static benchmark culture is the wrong control model for GenAI. Lakera’s core point is that models must be evaluated under adversarial pressure, not only against fixed prompt lists. That shift matters because real attackers do not behave like test suites, and enterprise models do not operate in isolated lab conditions. The practical conclusion is that governance has to measure behavioral resilience, not only content safety.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, according to The State of Secrets in AppSec.
A question worth separating out:
Q: How do organizations use AI risk scores in governance decisions?
A: Organizations should use AI risk scores as one input to model approval, exception handling, and periodic reassessment. The score becomes valuable when it is repeatable and tied to deployment context. It should sit alongside security review evidence, not replace it, because different models fail in different ways.
👉 Read our full editorial: AI model risk indices expose gaps in GenAI security testing