Who is accountable when an AI summary leads a user to click a malicious link?

Why This Matters for Security Teams

An AI summary that pushes a user toward a malicious link is not just a phishing problem. It exposes a governance gap across the assistant, the underlying permissions, and the human decision path. The key issue is that AI-generated output can change what a user trusts at the exact moment a security control is needed. NIST’s NIST Cybersecurity Framework 2.0 treats governance and response as core security functions, which fits this problem better than a narrow email-only view. NHI Management Group has also shown in its LLMjacking: How Attackers Hijack AI Using Compromised NHIs research that attacker behaviour increasingly targets AI systems through compromised identities and credentials, not just inbox content.

In practice, the accountabilities that matter most are the ones that can prevent unsafe output, constrain what the model can see, and define who approves AI use in security-sensitive workflows. If those ownership lines are vague, teams often discover the failure only after a user has already clicked, submitted credentials, or triggered a secondary compromise. That is why this question belongs in identity governance, security operations, and AI policy at the same time.

How It Works in Practice

Accountability is usually shared, but the operational owner should be explicit. The assistant owner is responsible for output controls, the identity team is responsible for access boundaries, the email or web security team is responsible for inbound threat filtering, and the business owner is responsible for how the summary is used. Current guidance suggests that no single layer can be treated as sufficient because AI output can reframe malicious content in a way that bypasses user suspicion.

A practical control model usually includes:

Policy on whether the assistant may summarize external content, and under what conditions.

Prompt and output filtering to reduce link amplification and unsafe instructions.

Least-privilege access so the model cannot retrieve or expose data it does not need.

Logging that ties the generated summary to the source content and the requesting user.

Incident ownership that spans the AI platform, identity, and security operations teams.

This is where identity discipline matters. If the assistant uses broad delegated access, the blast radius of a misleading summary grows quickly. If the system uses scoped credentials and context-aware policy checks, the organisation can reduce both exposure and ambiguity. NIST’s CSF 2.0 supports this kind of coordinated control structure, and the The State of Secrets in AppSec research shows why credential sprawl and weak secrets discipline often compound the issue.

These controls tend to break down when the assistant is embedded into email, chat, and browser workflows without a single accountable owner for the output layer.

Common Variations and Edge Cases

Tighter AI output controls often increase friction, requiring organisations to balance user convenience against stronger review and approval steps. That tradeoff is especially visible when the assistant is used for executive email triage, customer support, or developer tooling, where speed is valued and risky links can still appear legitimate.

There is no universal standard for liability allocation yet, so organisations should treat this as an operational accountability problem first and a legal question second. In some environments, the email security team may own the first line of defense, but the AI platform team still owns unsafe summarisation behaviour. In others, a regulated business unit may need to approve all AI-generated security-sensitive output before it reaches end users. The important distinction is whether the assistant is acting as a convenience layer or a decision-shaping layer.

The edge cases are usually the hardest ones: forwarded messages, internal phishing that appears trusted, multilingual summaries, and assistants with access to internal knowledge bases. Those scenarios often require both policy exceptions and human review thresholds. NHI Management Group’s research on compromised AI identities, including the DeepSeek breach, reinforces a simple point: when AI output and identity exposure intersect, accountability must be written down before an incident forces the answer.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Covers unsafe agent output that can mislead users into malicious actions.
CSA MAESTRO	GOV-02	Governance is needed to assign ownership for AI-generated security-sensitive output.
NIST AI RMF	GOVERN	AI governance requires accountability for model behaviour and downstream harm.

Define output-safety checks and human review for AI messages that can alter security decisions.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Who is accountable when an AI summary leads a user to click a malicious link?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group