TL;DR: LLMmap’s open-set LLM fingerprinting reached about 95% top-1 accuracy on raw model APIs, but recognition fell sharply in agentic deployments, dropping to 17.95% under restrictive prompts and to 38.46% under German output, according to Lasso Security. The practical lesson is that model identification is no longer a clean lab exercise once tools, prompts, and language shaping enter the response path.
NHIMG editorial — based on content published by Lasso Security: From Lab to Wild, how robust is LLM fingerprinting in the agentic era?
Questions worth separating out
Q: How should security teams test LLM fingerprinting in production AI agents?
A: Test the fingerprinting method against the exact production agent stack, not just the raw model API.
Q: Why does agentic AI make model identification less reliable?
A: Agentic AI adds tools, memory, retrieval, and formatting rules between the user and the model, so the observable response no longer reflects the model alone.
Q: What do security teams get wrong about fingerprinting hardened AI systems?
A: Teams often assume that a successful defensive prompt makes the model unidentifiable.
Practitioner guidance
- Benchmark fingerprinting against the deployed agent, not the base model Run identification tests on the exact production stack, including system prompt, tools, retrieval, language settings, and formatting rules.
- Treat prompt hardening as a fingerprinting control boundary Review which defensive instructions reduce observable model variance, such as forced refusals, rigid output schemas, or mandatory tool calls.
- Require reliability checks before operational use of model IDs Do not act on a top-1 model guess alone.
What's in the full report
Lasso Security's full research covers the operational detail this post intentionally leaves for the source:
- The per-configuration recognition tables that show how pure, default, german, and restrictive settings affect exact identification.
- The underlying statistical tests, including pairwise comparisons and corrected significance thresholds, for practitioners who need the evidence trail.
- The full description of the four application scenarios, including the customer-service chatbot, email assistant, and research assistant with RAG.
- The paper's future-work discussion on agent-shaped templates, multilingual retraining, and reliability-aware fingerprinting design.
👉 Read Lasso Security's research on LLM fingerprinting in agentic deployments →
LLM fingerprinting in agentic apps: where does it fail?
Explore further
Agentic wrappers turn model identification into a governance problem, not just a tooling problem. Once the model sits behind tools, retrieved context, and output constraints, the visible response signal is no longer the model alone. That means security teams are not evaluating a stable identity surface, but a composite behavioural layer that changes with deployment choices. The practitioner conclusion is that model inventory controls must account for the wrapper, not just the foundation model.
A few things that frame the scale:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: How can organisations tell if model identification results are trustworthy?
A: Use a confidence check that includes a distance metric, a second validation method, and an out-of-distribution flag for wrapped or multilingual deployments. If the tool cannot explain when its own answer becomes unreliable, treat the result as advisory rather than authoritative.
👉 Read our full editorial: LLM fingerprinting weakens fast in agentic deployments