Hugging Face assistant attacks expose the limits of prompt trust

By NHI Mgmt Group Editorial TeamPublished 2026-03-16Domain: Breaches & IncidentsSource: Lasso Security

TL;DR: A deceptive Hugging Face assistant used Sleepy Agent behaviour and image markdown rendering to exfiltrate user email addresses through an attacker-controlled URL, showing how prompt-based trust can be turned into a covert data path, according to Lasso Security. The control gap is not just model safety but identity governance for assistants that can leak data through ordinary conversation flow.

At a glance

What this is: The post shows how a malicious Hugging Face assistant can exfiltrate user data by hiding a trigger-based payload inside normal responses.

Why it matters: It matters because IAM teams now have to govern assistant behaviour, prompt changes, and data flow boundaries with the same seriousness they apply to secrets and workload identities.

👉 Read Lasso Security's analysis of Hugging Face assistant data exfiltration

Context

An AI assistant can look harmless while still behaving like a covert data-exfiltration path. In this case, the security issue is not only model output quality but the identity and trust boundary around the assistant itself, especially when user input can be carried into an attacker-controlled request.

For identity programmes, the lesson is simple: prompt visibility does not equal governance. If an assistant can be altered, triggered, or used to render external content, then the programme has to treat it as a non-human identity with a data-access path, not just a chatbot interface.

Key questions

Q: How should security teams handle AI assistants that can leak user data through rendering features?

A: Security teams should treat rendering features as part of the attack surface, not just the user interface. If an assistant can turn user input into an outbound request, it needs content filtering, output sanitisation, and explicit review of any markdown, image, or link expansion path that can carry sensitive data out of the session.

Q: Why do AI assistants complicate traditional IAM and governance models?

A: They complicate IAM because behaviour can change at runtime through prompts, triggers, and hidden conditions, even when the visible interface looks stable. That means the governed object is not only the user account or platform access, but the assistant’s ongoing behaviour, data handling, and change history.

Q: What do security teams get wrong about prompt transparency in AI assistants?

A: They often assume that if a prompt is visible, the risk is controlled. In reality, prompt transparency does not stop trigger-based logic, hidden exfiltration paths, or post-review prompt changes. Governance must focus on runtime behaviour and outbound data flow, not just whether instructions can be read.

Q: What is the difference between a safe-looking assistant and a governed assistant?

A: A safe-looking assistant answers normally in testing, while a governed assistant has reviewable ownership, change control, output restrictions, and tests for malicious triggers. The distinction matters because benign conversations do not prove that the assistant is safe when its instructions or rendering behaviour change later.

Technical breakdown

Sleepy Agent triggers in AI assistants

Sleepy Agent is a behaviour pattern where an assistant appears normal until a specific trigger is seen, then executes a hidden action. The trigger can be a keyword, a user input pattern, or a condition embedded in instructions. In practice, this matters because the malicious behaviour is not obvious from casual testing. The assistant may answer questions correctly while waiting for the condition that activates the exfiltration step. That makes static review of one prompt or one conversation insufficient when the assistant can be repurposed after deployment.

Practical implication: inspect assistants for trigger-based behaviour, not just obvious unsafe outputs.

Image markdown rendering as a covert data path

Image markdown rendering becomes dangerous when assistant output includes a URL that can carry user data in a parameter. The assistant can append a crafted image reference to its response, causing the browser to request attacker-controlled content and leak whatever was inserted into the URL. This is not a model hallucination problem. It is a data-flow problem created by combining model instructions, rendering behaviour, and browser requests. Once that path exists, the assistant can exfiltrate data without overtly revealing it in the visible response.

Practical implication: block or tightly constrain external rendering paths that can carry user-controlled data.

Why assistant marketplace trust breaks down

A marketplace assistant is a dynamic identity surface, not a one-time software install. If the system prompt can change after a user has started trusting the assistant, then the trust decision ages immediately. That is especially problematic when users cannot reliably tell whether the assistant’s instructions still match what they reviewed earlier. In identity terms, the assistant’s effective privileges and behaviour can drift without a clean lifecycle event, which is why prompt transparency alone is not a control boundary.

Practical implication: treat prompt changes as governance events and re-review assistant behaviour after every material update.

Threat narrative

Attacker objective: The attacker aims to capture user-entered data through a hidden rendering channel while keeping the assistant behaviour looking benign.

Entry via a malicious assistant prompt that looks normal until an email-like input pattern activates hidden behaviour.
Credential access occurs when the assistant appends user data to an attacker-controlled image URL and the browser requests it.
Impact is data exfiltration of user email addresses without visible warning in the assistant response.

Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
Hugging Face Spaces breach — Hugging Face Spaces breach exposed API keys and authentication tokens.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt visibility is not the same as behavioural trust. The article shows that users may be able to inspect an assistant prompt and still remain exposed if the prompt contains trigger logic or hidden exfiltration behaviour. That breaks the assumption that disclosure alone creates control. The governance problem is not whether the prompt can be read, but whether the assistant can still change what it does when a condition is met. Practitioners should treat the assistant as a governed identity surface, not a transparent object.

Image markdown rendering creates an identity-to-browser trust bridge that many AI programmes have not modelled. The malicious assistant did not need to visibly disclose the theft because the browser became the delivery mechanism. That means the security boundary is not just the model, but the combination of assistant instructions, rendering rules, and user context. The implication is that data-exfiltration risk sits in the interaction layer, where conventional chatbot reviews often stop too early.

Assistant lifecycle governance is the missing control plane here. The article highlights a practical failure mode: users can review an assistant once, but they have no reliable lifecycle signal when its instructions change later. That is a service-account-style governance gap applied to AI assistants. The result is prompt drift without accountability, which is why access review thinking needs to extend to assistant behaviour over time.

Hidden trigger logic is a named concept security teams should track. Sleepy Agent behaviour shows that an assistant can be safe in ordinary testing and still activate malicious logic when a specific input pattern appears. That makes the failure mode more precise than generic prompt injection risk. Practitioners should assume that benign conversation samples do not prove safe behaviour under runtime triggers.

External content rendering must be governed as a data path, not a cosmetic feature. The attack works because rendered content can leak values into outbound requests. Once that path exists, the assistant can move data without a visible theft event in the transcript. The practical conclusion is that browser-side rendering, link expansion, and image fetching need identity-grade governance when they are reachable from AI responses.

From our research:
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities.
The governance response is to treat assistant and workload exposure as lifecycle problems, as discussed in Ultimate Guide to NHIs , Key Challenges and Risks.

What this signals

Hidden trigger logic is now a mainstream governance concern for AI assistants. Once runtime behaviour can switch from benign to malicious on a specific input pattern, security teams need to evaluate assistants the way they evaluate other non-human identities: by ownership, change control, and revocation path. The boundary is behavioural, not cosmetic, and that means one-off review is not enough.

Assistant governance needs a control model that spans prompt, rendering, and data egress. If any one layer is unmanaged, the whole chain can become an exfiltration route. That is why identity teams should map AI assistant behaviour to their NHI programme rather than leaving it inside a pure application-security review.

Security teams that already struggle with third-party OAuth visibility are likely to find assistant governance even harder, because the trust surface is more dynamic and less observable. The practical next step is to inventory which assistants can fetch external content, change prompts, or carry user values into outbound requests, then tie that inventory to ownership and review cycles.

For practitioners

Review assistant prompts after every material change Treat prompt updates as change-managed governance events. Re-assess whether new instructions introduce trigger words, hidden conditions, or response paths that can leak user data through external requests.
Disable or constrain external rendering paths Block image markdown, remote content fetches, and any response feature that can place user-controlled values into an outbound URL. If rendering must exist, enforce allowlists and strip sensitive parameters before output.
Test assistants with trigger-based abuse cases Build red-team tests for benign-to-malicious switching conditions, especially email-like inputs, keyword triggers, and pattern-based activation. Do not rely on a single safe conversation transcript as evidence of control.
Track assistants as governed identities Assign ownership, review cadence, and retirement criteria to each assistant. If the system prompt, toolchain, or rendering behaviour changes, require re-approval before further use in production workflows.

Key takeaways

A seemingly normal AI assistant can hide trigger-based exfiltration logic and leak user data without obvious warning.
The attack succeeded because prompt trust and browser rendering were treated as safe by default, even though they created a covert outbound data path.
Assistant governance now needs lifecycle review, output restrictions, and behavioural testing, not just prompt inspection.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Covers malicious prompt behaviour and hidden trigger logic in assistants.
OWASP Non-Human Identity Top 10	NHI-03	Applies because assistant behaviour and access must be governed like other non-human identities.
NIST AI RMF		AI RMF governance is relevant where assistant behaviour and data flow can change at runtime.

Review assistant prompts for trigger-based abuse and restrict runtime behaviours that can exfiltrate data.

Key terms

Sleepy Agent: A Sleepy Agent is an assistant or model that behaves normally until a specific trigger appears, then activates hidden or harmful instructions. In governance terms, the risk is not visible misuse during ordinary testing, but conditional behaviour that only emerges under runtime conditions.
Image Markdown Rendering: Image markdown rendering is the process by which an assistant output causes a browser to fetch an image or external resource. When the rendered URL contains user data, it becomes a covert outbound channel. For AI governance, the issue is data egress disguised as harmless formatting.
Assistant Lifecycle Governance: Assistant lifecycle governance is the discipline of owning, reviewing, changing, and retiring AI assistants as managed identities. It covers prompt updates, tool access, output constraints, and approval history. Without it, an assistant can drift from reviewed behaviour into an untrusted runtime state.
Hidden Trigger Logic: Hidden trigger logic is code or prompt content that keeps a malicious action dormant until a specific keyword, pattern, or user behaviour appears. It matters because safe-looking interactions can conceal unsafe runtime paths, especially in assistants that are reused across many sessions.

Deepen your knowledge

AI assistant governance and non-human identity risk are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are trying to govern assistants that can change behaviour at runtime, this course is a strong fit.

This post draws on content published by Lasso Security: Exploiting HuggingFace’s Assistants to Extract Users’ Data. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-16.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org