AI email summaries create a new phishing surface in Copilot

By NHI Mgmt Group Editorial TeamPublished 2026-03-12Domain: Breaches & IncidentsSource: Permiso Security

TL;DR: AI email summarisation can turn attacker-supplied text into trusted-looking “security alert” content inside Copilot workflows, with behaviour varying across Outlook and Teams surfaces, according to Permiso Security. The risk is trust transfer, because users often treat assistant output as system-generated even when it is attacker-shaped, and that breaks existing email security assumptions.

At a glance

What this is: This analysis shows how attacker-controlled text inside email can influence Copilot summaries and create a new phishing surface inside trusted AI output.

Why it matters: It matters because IAM, email security, and identity programmes now have to account for trust being transferred from raw content to assistant-generated output across human, NHI, and agent-adjacent workflows.

By the numbers:

Microsoft confirmed completion of patch rollout to all affected surfaces on March 11, 2026.
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes.

👉 Read Permiso Security's analysis of AI email summary phishing and Copilot XPIA

Context

Copilot email summaries turn untrusted inbox content into a high-trust output surface, which means the security boundary is no longer just the message body. The primary issue is AI email summarisation acting as a phishing amplifier, because the model can inherit the appearance of authority even when the underlying text was attacker-written.

For IAM and security teams, the important question is not whether the assistant can summarise text, but whether users can tell the difference between user-authored content and AI-generated output shaped by hostile instructions. That boundary matters across human identity, NHI access, and any agentic workflow that can read from multiple Microsoft 365 sources.

The article's starting point is typical for modern enterprise AI adoption: convenience arrives first, and the governance model follows later. That lag creates a predictable gap between productivity use cases and the controls needed to keep them safe.

Key questions

Q: How should security teams govern AI email summaries that can be influenced by attacker text?

A: Treat AI email summaries as a governed attack surface, not a convenience feature. Apply content inspection before summarisation, restrict which data sources the assistant can retrieve, and make sure users can tell the difference between raw email and assistant-generated output. The goal is to prevent attacker-written text from acquiring the authority of a trusted system panel.

Q: Why do AI-generated security alerts make phishing more effective?

A: They borrow credibility from the assistant interface. Users are much more likely to trust a clean, system-like summary panel than a suspicious paragraph inside an email, so a false alert in that panel can trigger faster action and less scrutiny. That trust transfer is what turns familiar phishing into a stronger social engineering path.

Q: What breaks when Copilot can retrieve from multiple Microsoft 365 sources?

A: The blast radius expands from one message to the broader collaboration workspace. If the assistant can search Teams, OneDrive, or SharePoint while summarising email, an attacker can shape not only the summary text but also the context it pulls in. That can increase lure quality and widen exposure beyond the original email.

Q: Who is accountable when an AI summary leads a user to click a malicious link?

A: Accountability is shared across email security, identity governance, and AI policy owners. The email gateway may miss the malicious instruction, but the assistant, the data-access permissions, and the user training model all contributed to the outcome. Organisations need a clear ownership model for AI-generated output in security-sensitive workflows.

Technical breakdown

Cross prompt injection in email summaries

Cross prompt injection, or XPIA, happens when malicious instructions are embedded in content the model is asked to process, such as an email body or appended HTML. The model does not need to be compromised. It only needs to treat attacker-supplied text as higher priority than the user intended. In this case, the danger is amplified by summarisation, because the assistant produces a polished output that can hide the original manipulation while preserving the attacker’s desired framing. That makes the summary surface itself part of the attack path.

Practical implication: treat AI summarisation inputs as untrusted content channels and apply prompt-injection controls before the model renders any output.

Trust transfer from raw email to assistant output

The core security problem is trust transfer. Users are trained to distrust suspicious email bodies, but they often trust the assistant summary panel because it looks system-generated, consistent, and authoritative. That changes the social engineering model: the attacker no longer needs the email to persuade the user directly. Instead, the attacker only needs the assistant to generate a credible banner, warning, or call to action. Once that happens, the human decision point moves from content inspection to reflexive response in a trusted interface.

Practical implication: train users and configure UI controls so that assistant-generated content is not treated as implicit system authority.

Cross-app retrieval expands the phishing blast radius

The more an assistant can retrieve from adjacent sources such as Teams, OneDrive, or SharePoint, the more dangerous summary manipulation becomes. A summary that blends external email text with internal context can produce a believable lure and potentially expose sensitive references indirectly. This is not just a chat safety issue. It is an identity and access issue, because the assistant is operating inside permission boundaries that were designed for human-driven retrieval, not adversarial prompt shaping across multiple workloads.

Practical implication: review which Microsoft 365 sources Copilot can reach and constrain retrieval paths that increase the value of a manipulated summary.

Threat narrative

Attacker objective: The attacker wants the user to trust a manipulated AI summary enough to click a malicious link or act on a false security alert.

Entry occurs through a benign-looking email that contains appended instruction-like text designed to influence the assistant when the user triggers summarisation.
Escalation happens when Copilot generates a polished security-themed summary that echoes the attacker’s instructions and blends them with trusted UI elements.
Impact is achieved when the user clicks the attacker-shaped call to action, turning the assistant summary into a phishing and possible context-exposure vector.

DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
Schneider Electric credentials breach — exposed credentials gave attackers access to Schneider Electric Jira, exfiltrating 40GB.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI email summarisation is now a phishing surface, not just a productivity feature. The article shows that untrusted content can shape what users see in a trusted assistant panel, which turns summarisation into a security boundary. That boundary is relevant to identity programmes because the output inherits credibility the input never earned. Practitioners should treat summary UIs as governed attack surfaces, not neutral productivity overlays.

Trust transfer is the named failure mode here. Users do not evaluate assistant output the same way they evaluate raw email, so attacker-shaped content gains legitimacy through the interface itself. This is a human identity control problem first, but it also affects NHI governance because the assistant is acting as a privileged intermediary across messaging, collaboration, and file systems. The implication is that identity controls must account for how authority is perceived, not only how access is granted.

Cross prompt injection exposes a governance gap between content filtering and identity authorisation. Traditional email security can filter known malicious payloads, but it does not automatically prevent a model from turning benign-looking text into trusted guidance. OWASP-AGENTIC and NIST-AIRMF both point to the need for runtime governance of AI behaviour, while NIST-CSF remains relevant for controlling exposure and response. Practitioners should reassess where content handling ends and decision support begins.

Summary interfaces need separate policy from the systems they summarise. The article demonstrates that Outlook, Teams, and the Copilot pane can behave differently even when users think they are using one assistant. That makes interface-specific governance mandatory. Security teams should not assume a single control model will cover all summarisation surfaces, because the attacker will target the least resistant path.

Model-mediated phishing will keep expanding as retrieval gets broader. Once summarisation can reach across email and collaboration data, the attack is no longer limited to a fake banner. The broader the retrieval scope, the more likely the assistant can be used to build convincing lures from legitimate internal context. Practitioners need to treat access scope, not only model safety, as part of phishing defence.

From our research:
Only 19.6% of security professionals express strong confidence in their organisation's ability to securely manage non-human workload identities, according to the 2024 Non-Human Identity Security Report.
Another finding from that report shows 35.6% of organisations cite managing consistent access across hybrid and multi-cloud environments as their top NHI security challenge.
That gap is why practitioners should also review Ultimate Guide to NHIs , Why NHI Security Matters Now for the broader governance context around identity growth and exposure.

What this signals

Trust transfer is becoming a programme-level issue, not a user-training issue alone. If your security stack cannot distinguish human-authored content from assistant-shaped output, then Copilot-style summaries can become an endorsed phishing surface. That is why summary UIs, retrieval permissions, and identity policy need to be reviewed together rather than as separate teams' problems.

AI summary risk sits at the intersection of human IAM and NHI control scope. The same organisation that struggles to govern workload identities cleanly will usually struggle to govern assistant access to collaboration data with equal precision. That is a sign that content-layer controls are not enough without tighter identity and retrieval governance.

One useful concept here is model-mediated phishing: an attacker uses the assistant to voice the lure. Once that pattern appears, the response has to include more than email filtering. Practitioners should think in terms of retrieval scope, output trust marking, and user confirmation before actions triggered from AI-generated alerts.

For practitioners

Separate assistant trust from message trust Mark Copilot-generated summaries as assisted content in user guidance and UI design so recipients do not confuse polished output with authenticated system notifications.
Constrain cross-app retrieval paths Review which Teams, OneDrive, and SharePoint sources the assistant can access during summarisation and reduce retrieval scope where it increases the impact of manipulated prompts.
Detect instruction-like text in summarised content Add prompt-injection detection and content inspection before summarisation so appended instructions are identified before the model renders them into a trusted panel.
Harden user response paths in the summary UI Require explicit confirmation before any click-through action from AI-generated security alerts, especially when the summary contains urgent account or sign-in language.

Key takeaways

AI email summarisation can turn attacker-supplied text into a trusted-looking phishing lure inside the assistant UI.
The evidence points to a trust problem more than a pure filter failure, because users are primed to believe polished system output.
Security teams need to govern retrieval scope, output trust, and user response paths together if they want to reduce this risk.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Covers prompt injection and unsafe assistant behaviour in AI summary workflows.
NIST AI RMF		Addresses governance and oversight for AI systems that shape security decisions.
NIST CSF 2.0	PR.AC-4	Access control scope matters when assistants can retrieve email and collaboration data.

Apply agentic AI controls to constrain prompt influence and review assistant outputs before user action.

Key terms

Cross Prompt Injection: Cross prompt injection is an attack where hostile instructions are hidden inside content an AI system is asked to process, such as an email, document, or chat message. The model treats attacker text as input and may follow it during summarisation, retrieval, or response generation, even though the user never intended that content to become an instruction.
Trust Transfer: Trust transfer is the security failure that occurs when users give AI-generated output more credibility than the raw content it was based on. In practice, the assistant’s polished tone, layout, or system-like framing can make attacker-shaped content feel legitimate, which increases the chance of unsafe user action.
Model-Mediated Phishing: Model-mediated phishing is a social engineering pattern where an attacker uses an AI assistant to deliver the lure instead of sending the lure directly. The assistant becomes the voice, formatting layer, or authority cue, which can make malicious instructions seem more trustworthy than the original email or message.
Retrieval Scope: Retrieval scope is the set of documents, messages, files, and systems an AI assistant can access while responding to a prompt. When the scope is broad, a manipulated prompt can pull in more context than intended, increasing both the quality of the lure and the potential for data exposure.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Permiso Security: Co-Pilot, Disengage Autophish, the new phishing surface hiding inside AI email summaries. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org