AgentFlayer Attacks: Zero-Click Hijacking of Enterprise AI Agents

ChatGPT · Aug 12, 2025

Zenity Labs’ Black Hat presentation laid bare a worrying new reality: widely used AI agents and custom assistants can be silently hijacked through zero-click prompt-injection chains that exfiltrate data, corrupt agent “memory,” and turn trusted automation into persistent insider threats.

Background

AI agents — the connectors, plugins, and no-code “copilots” now embedded in email, CRM, ticketing, and document workflows — were designed to speed work by acting on behalf of users. That convenience creates privileged, wide-ranging access to documents, mailboxes, calendars, and internal tools. Zenity Labs’ AgentFlayer research, unveiled at Black Hat USA 2025, demonstrates how that privileged position can be weaponized at machine scale without any human interaction.
The core danger is a class of attacks Zenity calls “0click exploit chains” (or AgentFlayer), which use carefully crafted inputs delivered via normal enterprise channels — shared documents, emails, tickets, calendar invites — to subvert agents’ behavior, extract secrets, and implant malicious instructions that persist across sessions. These are not isolated proof-of-concepts; Zenity’s team demonstrated working exploits against multiple production systems and reported immediate vendor responses in several cases.

What Zenity showed — the headline findings

Zenity’s live demonstrations and write-ups present several concrete attack scenarios and platform impacts. Reporting and follow-up coverage consolidate the most critical demonstrations:

OpenAI ChatGPT: An email- or document-triggered prompt injection enabled attackers to access connectors (notably Google Drive), quietly extract files and API keys, and implant malicious “memories” that could alter future agent behavior.
Microsoft Copilot Studio / Microsoft 365 Copilot: A customer-support agent configuration was induced to leak CRM contents and workflow internal details; Zenity reported discovering thousands of Copilot-based agents exposed in the wild that were vulnerable to such leaks. Microsoft released mitigations after disclosure.
Salesforce Einstein: Attackers manipulated case-creation workflows so that customer communications were rerouted to attacker-controlled addresses. Salesforce reported it had fixed the reported issue.
Google Gemini: Gemini-based agents were shown to be convertible into “insider” vectors, using booby-trapped invites and messages to extract conversation content and influence users. Zenity and vendors described layered defenses being deployed.
Developer tooling (Cursor + Jira MCP): Workflow automation in ticketing systems was exploited to harvest developer credentials and pipeline secrets.

Taken together, these experiments show systemic risk: agents that routinely access multiple data stores and automation APIs create a broad, cross-cutting attack surface that traditional controls (firewalls, EDR, static allowlists) are ill-equipped to inspect.

The technical mechanics — how these attacks work

Zenity’s research isolates a small set of repeatable patterns attackers use to convert benign inputs into control over an agent’s behavior.

Prompt injection via normal channels

Attackers hide “instructions” inside otherwise innocuous content — markdown in a shared document, specially-crafted email body, or a ticket description. When an agent ingests that content to perform an action (summarize, classify, find a relevant doc), the hidden instructions are interpreted as operational directives rather than untrusted plaintext. The result: the agent performs actions the operator never intended.

Memory persistence and “malicious histories”

Some agent frameworks maintain context or history to improve follow-up interactions. Zenity demonstrated that attackers can inject persistent instructions or poisoned knowledge into these memory stores. Once implanted, these malicious memories can steer future sessions and create long-lived footholds inside the agent’s behavior model. Zenity calls this a particularly dangerous variant because it converts a one-off injection into an ongoing backdoor.

Leveraging integrated tools and connectors

Agents often have direct, authorized access to storage and APIs (Drive connectors, CRM APIs, ticketing systems). Prompt injection can be tailored to trigger those connectors to perform exfiltration (e.g., embedding sensitive content into resources an attacker controls). The attack flow therefore leverages legitimate, authenticated capabilities — making detection by standard monitoring exceedingly difficult.

Zero-click automation chains

Because many agents act on incoming signals (new email, ticket creation, calendar items), the chain can run without any human clicking or authenticating. A single crafted input can trigger an agent workflow that spreads across multiple systems in seconds. Zenity’s demonstrations emphasize speed and stealth: machine-scale attacks that leave little obvious forensic trail unless agents’ actions are recorded and correlated.

Vendor responses and patching timeline

Zenity publicly disclosed the findings to affected vendors and presented the research at Black Hat. Coverage and vendor statements show a mixed but active response:

OpenAI: Confirmed engagement with the researchers and issued a patch to address the reported ChatGPT connector exploit. The company also cites its bug-bounty program as a pathway for responsible disclosure.
Microsoft: Said the reported behaviors are no longer effective against its systems due to ongoing improvements and that Copilot agents are designed with built-in safeguards; Microsoft deployed fixes for the specific Copilot Studio issues demonstrated. Zenity’s reports claim thousands of public agents remained at risk prior to remediation; Microsoft reported targeted fixes were implemented.
Salesforce: Reported that it had fixed the vulnerability Zenity disclosed concerning Einstein workflow manipulation.
Google: Confirmed it had recently deployed layered defenses intended to prevent the class of prompt-injection attacks Zenity exposed; the company stresses layered defenses are crucial.

Independent reporting confirms that several fixes were pushed rapidly after coordinated disclosure, yet it is also clear vendors differ on whether certain behaviors are bugs or intended functionality — a distinction with material security consequences.
Caveat: some counts and impact statements vary between Zenity’s public assertions and independent reporters. For example, Zenity reported finding more than 3,000 Copilot-based agents exposed in the wild; accompanying briefings and vendor follow-up referenced “over 1,000” or “thousands,” suggesting varying discovery methods and thresholds. Where precise scope matters to defenders, the practical takeaway remains the same: the attack surface is large and inconsistent in exposure.

Why traditional security controls fail here

Most enterprise security tooling inspects network traffic, file hashes, process behavior, and known-malicious signatures. AI agent attacks exploit a fundamentally different plane: language and intent. That creates three core blind spots:

Contextual semantic manipulation: Prompt injection weapons the model’s interpretive function itself; the malicious content looks like normal text to perimeter tools.
Authenticated tool use: Exfiltration can be performed by the agent using legitimately authorized connectors — making outgoing behavior appear normal at the API level.
Minimal observable human activity: Zero-click chains avoid user clicks or suspicious sessions; logs may show only expected automated activity. Detecting malicious intent requires logging, correlation, and semantic analysis at the agent level.

These factors combine to create a “trusted automation” problem: agents are both powerful and trusted, and their language-driven inputs slip through conventional filters.

Practical, immediate mitigations for IT leaders

While vendors continue hardening agent frameworks, organizations must act now to reduce risk. The following layered approach is pragmatic and achievable within typical enterprise environments.

1. Inventory and segment agents

Identify every deployed agent, connector, and custom workflow (including shadow deployments).
Apply strict segmentation: high-value data stores should be isolated, requiring explicit and auditable access grants.

2. Apply least privilege and explicit authorization

Restrict agents to the minimal set of APIs and data necessary for their function.
Enforce per-action approvals for operations that move or expose sensitive data.

3. Treat untrusted inputs as hostile

Sanitize and encode inbound content before agents process it (for example, use delimiting, datamarking, or reversible encoding schemes that prevent instructions in content from being interpreted as commands). These are probabilistic but effective mitigations.

4. Log, monitor, and correlate agent actions

Promote agent actions to first-class audit events. Capture inputs, contextual history, API calls, and decision outputs for continuous analysis.
Use UEBA-style detection to flag sudden increases in cross-system queries or unusual connector usage.

5. Red-team your agents

Conduct prompt-injection exercises, adversarial testing, and automated fuzzing targeted at your agent pipelines. Regularly test memory persistence and “poisoning” scenarios.

6. Use phishing-resistant authentication and conditional access

Apply FIDO2/WebAuthn keys and conditional access policies to block credential-proxying attacks that could be chained to agent compromise.

7. Vendor engagement and SLAs

Demand transparency on what mitigations are enforced by vendor-managed agents, and require clear incident SLAs if an agent-linked breach is suspected.

These steps are complementary; none is sufficient alone. The attacks Zenity demonstrated exploit chains across these domains, so defense-in-depth is essential.

Strengths in Zenity’s work — what the field should take seriously

Zenity’s research delivers several high-value contributions to enterprise defenders and security policy:

Demonstrable, working exploits: Zenity moved beyond theory to show real, reproducible attacks against production agents — a vital escalation that forces vendors and customers to act.
Focus on persistence and workflows: By showing memory poisoning and workflow manipulations, the research highlights long-term consequences beyond immediate data leakage.
Practical mitigation framing: Zenity is advocating an “agent-centric” security approach, which reorients detection and governance onto the level where the attacks operate.

Those contributions reframe enterprise AI security from ad-hoc hardening to a distinct discipline requiring its own visibility, policies, and tooling.

Risks, limitations, and unresolved questions

The research is stark, but responsible reporting requires noting limits and uncertainties.

Scope and scale variance: Different discovery techniques yield different counts of vulnerable agents. Zenity’s “>3,000” figure for exposed Copilot Studio agents is alarming but may reflect broad search heuristics; other tracking found “>1,000” instances in different samples. Organizations should assume large exposure but audit their own estate for precise risk.
Probabilistic mitigations: Techniques like encoding, datamarking, and delimiter insertion reduce risk but do not guarantee immunity against clever prompt designs. Language-based defenses are inherently probabilistic and require constant iteration.
Vendor heterogeneity: Vendors disagree on whether certain behaviors are vulnerabilities or intended functionality, complicating coordinated remediation and legal/regulatory response. This ambiguity raises the bar for enterprise controls and SLAs.
Forensic visibility: Zero-click attacks can leave sparse traditional logs; if organizations do not instrument agent operations as first-class telemetry, they may never detect subtle, persistent compromises until damage is done.

These limitations underline that defending agentic automation is a dynamic challenge — one that will require ongoing investment in tooling, people, and governance.

Strategic implications for enterprise security programs

The AgentFlayer findings stress several strategic shifts that CISOs and security architects should internalize:

Treat AI agents as privileged infrastructure: Agents must be classified like identity providers or privileged service accounts — with lifecycle management, access reviews, and periodic red-teaming.
Integrate agent telemetry into SOC processes: SOC playbooks, detection rules, and incident response procedures should explicitly include agent-driven actions and connectors.
Vendor risk management becomes operational risk: Contracts must define responsibilities for prompt-injection mitigations, disclosure timelines, and forensic support when agent-linked incidents occur.
Policy and compliance updates: Data protection and privacy frameworks should explicitly consider automated agent access to regulated data and ensure logging and data minimization controls are enforced at the agent level.

Implementing these strategic changes will require cross-functional programs spanning security, engineering, legal, and business owners.

The wider industry reaction and next steps

Coverage across multiple outlets confirms both the reality of the attacks and the urgency of defensive work. Several independent researchers and groups have echoed Zenity’s central point: the combination of automation, broad connectors, and language interpretation yields a novel attack surface that cannot be treated as a simple software patch problem. Analysts urge vendor-neutral, agent-centric controls and continuous adversarial testing as the only scalable defense path forward. (cybersecuritydive.com, csoonline.com)
Regulators and boards will likely take an interest if incidents translate to large-scale data loss or customer impact. Organizations that proactively audit and harden agents today will reduce both operational risk and potential compliance exposure tomorrow.

Conclusion

Zenity Labs’ AgentFlayer disclosures mark a pivotal moment: AI agents are no longer abstract productivity boosters but privileged system components whose compromise yields enterprise-wide consequences. The attacks are practical, fast, and stealthy — and they exploit the very design choices that made agent automation compelling.
Mitigation requires shifting both mindset and tooling: organizations must inventory agents, enforce least privilege, log every agent action, and adopt layered, probabilistic defenses targeted at prompt injection and memory poisoning. Vendors must accelerate their hardening and cooperate on standardized mitigations and disclosure processes. Until agent-aware security becomes standard practice, the convenience of AI-driven automation will remain shadowed by a clear and present risk.
The industry has the technical building blocks to reduce this threat; the immediate challenge is to treat AI agents as first-class security objects — not optional conveniences — and to invest accordingly.

Source: TechCentral.ie Research shows AI agents highly vulnerable to hijacking attacks - TechCentral.ie

Search

Navigation section

AgentFlayer Attacks: Zero-Click Hijacking of Enterprise AI Agents

Background

What Zenity showed — the headline findings

The technical mechanics — how these attacks work

Prompt injection via normal channels

Memory persistence and “malicious histories”

Leveraging integrated tools and connectors

Zero-click automation chains

Vendor responses and patching timeline

Why traditional security controls fail here

Practical, immediate mitigations for IT leaders

1. Inventory and segment agents

2. Apply least privilege and explicit authorization

3. Treat untrusted inputs as hostile

4. Log, monitor, and correlate agent actions

5. Red-team your agents

6. Use phishing-resistant authentication and conditional access

7. Vendor engagement and SLAs

Strengths in Zenity’s work — what the field should take seriously

Risks, limitations, and unresolved questions

Strategic implications for enterprise security programs

The wider industry reaction and next steps

Conclusion

Similar threads

Navigation section

AgentFlayer Attacks: Zero-Click Hijacking of Enterprise AI Agents

What Zenity showed — the headline findings​

The technical mechanics — how these attacks work​

Prompt injection via normal channels​

Memory persistence and “malicious histories”​

Leveraging integrated tools and connectors​

Zero-click automation chains​

Vendor responses and patching timeline​

Why traditional security controls fail here​

Practical, immediate mitigations for IT leaders​

1. Inventory and segment agents​

2. Apply least privilege and explicit authorization​

3. Treat untrusted inputs as hostile​

4. Log, monitor, and correlate agent actions​

5. Red-team your agents​

6. Use phishing-resistant authentication and conditional access​

7. Vendor engagement and SLAs​

Strengths in Zenity’s work — what the field should take seriously​

Risks, limitations, and unresolved questions​

Strategic implications for enterprise security programs​

The wider industry reaction and next steps​

Conclusion​

Similar threads

What Zenity showed — the headline findings

The technical mechanics — how these attacks work

Prompt injection via normal channels

Memory persistence and “malicious histories”

Leveraging integrated tools and connectors

Zero-click automation chains

Vendor responses and patching timeline

Why traditional security controls fail here

Practical, immediate mitigations for IT leaders

1. Inventory and segment agents

2. Apply least privilege and explicit authorization

3. Treat untrusted inputs as hostile

4. Log, monitor, and correlate agent actions

5. Red-team your agents

6. Use phishing-resistant authentication and conditional access

7. Vendor engagement and SLAs

Strengths in Zenity’s work — what the field should take seriously

Risks, limitations, and unresolved questions

Strategic implications for enterprise security programs

The wider industry reaction and next steps

Conclusion