Pwn2Own Berlin 2026: Exchange, Edge, Windows 11 and AI Tools Under Exploit Chains

  • Thread Author
Pwn2Own Berlin 2026, held this week at OffensiveCon in Berlin, saw researchers compromise fully patched Microsoft Exchange, Microsoft Edge, Windows 11, Red Hat Enterprise Linux, Nvidia tooling, and multiple AI platforms, with Zero Day Initiative confirming $908,750 paid for 39 unique zero-days before final Day 3 totals. The headline is not that Windows was “hacked,” because Pwn2Own exists to make that happen under controlled rules. The headline is that Microsoft’s desktop, browser, and server stack all appeared in the blast radius at the same contest, while AI developer tools joined them as first-class targets. For Windows users and administrators, Berlin is less a panic siren than a preview of the patching, hardening, and threat-model updates coming over the next 90 days.

Tech conference screen shows “Pwn2Own Berlin 2025” hacking metrics, timelines, and cyber-defense diagrams.Berlin Turned Microsoft’s Patch Tuesday Assumptions Inside Out​

Pwn2Own is often misread as a scoreboard of shame. That framing is too easy and not especially useful. The contest rewards researchers for demonstrating real exploit chains against current, supported products, then routes the vulnerability details to vendors under a disclosure clock.
That matters because the targets are not museum pieces. The Exchange Server exploit demonstrated by Orange Tsai of DEVCORE reportedly achieved remote code execution as SYSTEM on a fully patched Microsoft Exchange Server by chaining three bugs. In the same two-day window, Windows 11 was successfully exploited four times, Microsoft Edge lost its sandbox, and Microsoft’s platform story took hits at the endpoint, browser, and messaging layers.
For Microsoft, this is familiar territory but uncomfortable timing. The company has spent the past two years telling enterprises that security is now a top engineering priority, not merely a compliance posture. Pwn2Own Berlin does not disprove that claim, but it does expose the scale of the job: modern Windows security is a stack of mitigations, dependencies, legacy compatibility choices, and cloud-era assumptions that researchers can still pry apart.
The important distinction is that these demonstrations are not known to be active in the wild simply because they worked on stage. But defenders should not treat that distinction as comfort. Pwn2Own shows exploitability before criminals and state actors necessarily have the same chain, and that is exactly why the event is valuable.

Exchange Remains the Server Microsoft Cannot Afford to Get Wrong​

The $200,000 Exchange result was the most consequential Microsoft moment of the contest because Exchange is not just another enterprise workload. It is an identity-adjacent, internet-facing, historically targeted communications platform that often sits at the crossroads of mail flow, directory integration, compliance retention, and executive correspondence. When Exchange falls, defenders do not just worry about a server; they worry about the organization’s nervous system.
Orange Tsai’s name also carries weight in this category. DEVCORE’s prior Exchange research helped define the modern era of Exchange exploitation, and the security community remembers how quickly Exchange flaws can move from advisory text to mass exploitation. Even without public technical details, the combination of remote code execution, SYSTEM privileges, and a fully patched target is enough to make administrators sit straighter.
The contest rules mean Microsoft gets the details privately and a 90-day disclosure window begins. That process is designed to prevent copycat exploitation while still ensuring vendors cannot bury the issue indefinitely. For Exchange admins, the practical response is not to hunt for proof-of-concept code; it is to make sure patching, monitoring, segmentation, and incident response are ready before the advisory arrives.
Exchange’s long tail remains the problem. Many organizations have moved mailboxes to Exchange Online but still maintain hybrid Exchange servers for management, relay, or legacy workflows. Those systems can become forgotten infrastructure: too important to remove, too awkward to modernize, and too exposed to ignore.
Microsoft has pushed customers toward cloud-hosted mail partly because defending internet-facing collaboration servers at global scale is hard. Pwn2Own Berlin reinforces that argument without turning it into a simple cloud-versus-on-prem morality play. Cloud services have their own risks, but unmanaged or under-managed Exchange servers continue to impose a security tax that many organizations underestimate.

Windows 11 Took Repeated Local Hits, Not a Single Knockout Blow​

Windows 11 being hacked four times sounds dramatic, and it is, but the details matter. The confirmed Windows 11 entries were privilege-escalation demonstrations rather than remote drive-by compromises. That means an attacker would generally need some initial foothold before using the flaw to climb from a lower-privileged context into a more powerful one.
That should not minimize the risk. Local privilege escalation is a critical ingredient in real intrusions because attackers rarely stop at the first compromised process. Malware, phishing payloads, malicious documents, compromised developer tools, and browser escapes all become more dangerous when paired with a reliable path to elevated privileges.
The repeated Windows 11 wins also illustrate an old truth about operating system security. Microsoft can add virtualization-based security, memory protections, code integrity, sandboxing, and kernel hardening, but the platform must still support a vast driver ecosystem and decades of application behavior. Every compatibility promise becomes part of the attack surface.
For desktop administrators, this is where security architecture becomes more important than patch speed alone. Least privilege, application control, driver hygiene, endpoint detection, attack surface reduction rules, and rapid rollback plans all matter because Windows exploitation is usually a chain. Breaking one link is often enough to turn a flashy stage demo into a failed intrusion.
The Day 3 Windows result, in which a Viettel Cyber Security team used an integer overflow to escalate privileges on Windows 11, added another reminder that these bugs are rarely exotic from a defender’s perspective. The vulnerability class may be technical, but the operational lesson is plain: fully patched does not mean fully safe.

Edge’s Sandbox Escape Shows Why Browser Security Is Never Finished​

Orange Tsai’s Day 1 Microsoft Edge demonstration chained four logic bugs to escape the browser sandbox and earned $175,000. That result is notable because sandbox escapes are exactly what modern browsers are built to resist. The browser is supposed to assume hostile content and contain the damage.
A sandbox escape does not automatically mean every Edge user is in immediate danger. The exploit chain, target configuration, and contest conditions all matter. But as a class of vulnerability, browser sandbox escapes remain prized because the browser is the endpoint’s most exposed application and often the easiest way to reach a user without needing prior access.
Edge also sits in a complicated identity and enterprise-management position. It is a browser, a PDF viewer, a Microsoft 365 access layer, a policy-controlled enterprise app, and increasingly a front end for AI-infused workflows. That makes its security boundary more valuable and more stressed.
The more Microsoft integrates Edge with Windows, Entra ID, Defender, Copilot, and Microsoft 365, the more the browser becomes part of the operating environment rather than merely an application installed on top of it. Pwn2Own’s Edge result is therefore not just a browser story. It is a reminder that the browser remains the place where consumer convenience, enterprise policy, web complexity, and attacker creativity collide.

AI Tools Became Targets Because They Became Infrastructure​

The most revealing shift in Berlin may not be any single Microsoft exploit. It is that AI platforms and developer tools were not sideshow targets. LiteLLM, OpenAI Codex, NVIDIA Megatron Bridge, Chroma, LM Studio, Cursor, and Anthropic Claude Code all appeared in the contest results or schedule, with multiple successful demonstrations across the first three days.
That is a striking change from the way security teams talked about AI just a few years ago. Back then, the dominant concern was model output: hallucinations, prompt injection, data leakage, and unsafe recommendations. Those still matter, but Pwn2Own Berlin focused attention on something more concrete: AI tooling is software infrastructure, and software infrastructure has bugs.
Developer-focused AI tools are especially sensitive because they sit close to source code, credentials, repositories, local files, build systems, and deployment workflows. A coding agent that can read a project, modify files, invoke tools, or interact with a host environment is not merely a chatbot. It is a privileged automation layer wrapped in a conversational interface.
That creates new attack paths for Windows shops. A compromised AI coding assistant on a developer workstation may be more valuable than a compromised consumer app because it can bridge local files, cloud tokens, package managers, and production code. The endpoint becomes not only a target but a launchpad into the software supply chain.
The contest’s AI category also hints at why vulnerability volume may rise. Researchers are using AI to find bugs, generate harnesses, triage crashes, and prepare submissions faster. If that productivity increase holds, vendors should expect more reports, not fewer, and security teams should expect the disclosure pipeline to become busier.

The 90-Day Clock Is a Gift, Not a Guarantee​

Pwn2Own’s disclosure model gives vendors time to patch before technical details are made public. That 90-day window is one of the contest’s most important safety valves. It turns public exploitation into coordinated remediation rather than instant weaponization.
But the window is not magic. Vendors still have to reproduce the bug, understand the root cause, engineer the fix, test it across supported versions, avoid regressions, and ship updates through channels that enterprises can actually consume. Administrators then have to deploy those updates into real environments full of maintenance windows, change boards, brittle dependencies, and business owners who hate downtime.
Exchange makes that especially painful. Applying Exchange updates can be operationally delicate, and many organizations have learned the hard way that being behind on cumulative updates can complicate emergency patching. If Berlin’s Exchange chain results in a security update, the organizations best positioned to respond will be the ones already current enough to install it quickly.
Windows 11 fixes are usually easier to distribute at scale, but they are not free of friction. Endpoint fleets include remote workers, sleeping laptops, pinned application versions, VPN dependencies, and security tools that can misbehave after kernel or subsystem changes. The best patch program is not the one that promises instant deployment everywhere; it is the one that knows which systems are exposed, which controls can compensate, and which failures are acceptable.
The practical lesson is to use the disclosure clock to prepare before the bulletin drops. Review Exchange exposure. Confirm inventory. Check whether hybrid servers are still necessary. Validate endpoint privilege controls. Make sure security teams can detect suspicious privilege escalation, web shell behavior, abnormal Exchange child processes, and unexpected developer-tool activity.

Microsoft’s Security Story Now Has to Survive Its Own Complexity​

Microsoft is not uniquely vulnerable because it appeared repeatedly at Pwn2Own. It appeared repeatedly because its products are everywhere, valuable, and complex enough to reward deep research. The same logic explains why Red Hat, Nvidia, VMware, AI vendors, and browser makers also attract attention.
Still, Microsoft faces a special burden. Windows remains the default enterprise endpoint, Exchange remains entrenched in hybrid and on-prem environments, Edge is increasingly woven into Microsoft’s productivity ecosystem, and Microsoft’s AI push is turning developer and user workflows into new integration surfaces. No other vendor owns quite the same span from kernel to inbox to browser to cloud identity to AI assistant.
That breadth is commercially powerful and defensively awkward. A security improvement in one layer can be undermined by a legacy dependency in another. A hardened desktop can still be exposed by an overprivileged user, a vulnerable driver, a misconfigured server, or a developer tool with broad filesystem access.
The company’s Secure Future Initiative has publicly acknowledged that Microsoft must change engineering culture, identity practices, and vulnerability response. Pwn2Own Berlin should be read through that lens. The contest is not a verdict on whether Microsoft has changed; it is a stress test of whether the changes are enough against researchers who are paid to find the seams.
For customers, the answer cannot be vendor faith or vendor fatalism. Microsoft will patch. Researchers will keep finding bugs. Attackers will keep building chains. The winning enterprise posture is to assume each layer can fail and design the environment so that failure is observable, contained, and recoverable.

The Contest Also Exposed a Capacity Problem in Vulnerability Research​

One of the more telling details from Berlin is that the event reportedly hit capacity for the first time in its 19-year history, with more than 150 researchers turned away because the schedule could not accommodate them. That is not a trivia point. It suggests that the market for high-end vulnerability research is expanding faster than the institutions built to process it.
Capacity limits create strange incentives. If researchers cannot get on a stage, they may hold bugs, sell them elsewhere, disclose them independently, or publish details in frustration. A well-run contest channels vulnerability discovery into a responsible pipeline; a bottlenecked contest reveals how much discovery is happening outside that pipeline.
The AI angle likely compounds this. Even if AI does not magically invent elite exploit chains on demand, it can reduce the cost of repetitive work: fuzzing setup, crash deduplication, code navigation, documentation, and exploit scaffolding. That means more researchers can reach the threshold where a submission is credible.
Vendors should treat that as an early warning. The future is not one in which AI only helps attackers write phishing emails. It is one in which AI accelerates the professional bug-hunting economy, increases report volume, and compresses the time between product release and vulnerability discovery.
For defenders, this means public vulnerability counts may rise even if software quality improves. More bugs found can be a sign of better discovery, not necessarily worse code. The hard question is whether vendors and customers can turn that discovery into faster risk reduction.

Windows Admins Should Read Berlin as a Change-Management Drill​

The most useful response to Pwn2Own Berlin is not outrage. It is rehearsal. The contest has effectively told administrators that important Microsoft fixes are likely coming, that some may involve high-value enterprise components, and that the supporting cast now includes AI and developer infrastructure many asset inventories barely track.
Exchange should be the first inventory item. Organizations should know which servers are internet-facing, which are hybrid-only, which versions and cumulative updates are installed, and whether any legacy server is still present because nobody wanted to break a workflow. If a server is not needed, retirement is better than heroic hardening.
Windows 11 fleets deserve a second look at privilege boundaries. Local administrator sprawl remains one of the easiest ways to turn a contained endpoint incident into a domain problem. Pwn2Own’s privilege-escalation results are a reminder that the operating system’s security model works best when users and applications do not already have more rights than they need.
Browser policy matters too. Edge’s sandbox exists for a reason, but enterprises should still enforce sensible controls around extensions, downloads, credential handling, and isolation for risky browsing. A browser exploit chain becomes harder to operationalize when the surrounding environment is less permissive.
AI tooling now belongs in the same conversation. If developers are installing coding agents, local model runners, vector databases, or AI workflow tools without security review, the organization has an asset-management blind spot. The question is not whether AI tools are allowed; it is whether they are governed like the powerful software they have become.

The Real Story Is the Chain, Not the Single Bug​

Pwn2Own results often get summarized as product X was hacked, product Y was hacked, vendor Z was embarrassed. That language is convenient, but it hides the method. The most valuable demonstrations are chains, not isolated mistakes.
The Exchange result reportedly required three bugs. The Edge sandbox escape required four logic bugs. Other AI and operating system entries involved combinations of vulnerability classes, misconfigurations, or trust-boundary failures. That is how modern exploitation works: one bug opens a door, another crosses a boundary, another raises privilege, and another turns access into control.
This is why single-control security narratives age badly. “We patch quickly” is good but incomplete. “We use EDR” is good but incomplete. “We moved to Windows 11” is good but incomplete. “We use cloud mail” is good but incomplete. Attackers do not need one perfect weakness if they can assemble several ordinary ones.
The defender’s equivalent is layered disruption. Remove local admin rights, and a privilege-escalation bug has less room to run. Segment Exchange, and remote code execution has fewer places to pivot. Restrict scripting and child processes, and post-exploitation gets noisier. Govern developer tools, and AI-assisted workflows become less of a shadow IT playground.
Berlin’s lesson is therefore architectural. Secure products matter, but secure environments matter more because no product survives contact with every researcher, attacker, and edge case forever.

The Numbers From Berlin Point to a Busier Patch Season​

The practical facts are simple enough, even if the implications are not. Pwn2Own Berlin’s first two confirmed days produced $908,750 in awards for 39 unique zero-days, with Day 3 adding more results as the contest moved through Windows 11, Red Hat Enterprise Linux, OpenAI Codex, Anthropic Claude Code, and VMware ESXi attempts. The early total was already large enough to make this one of the more consequential recent Pwn2Own events for enterprise defenders.
The most concrete takeaways are the ones administrators can act on before exploit details become public:
  • Microsoft Exchange administrators should verify exposure, update readiness, hybrid-server necessity, and monitoring before any related advisory lands.
  • Windows 11 administrators should treat local privilege escalation as a serious link in intrusion chains, not as a low-priority desktop nuisance.
  • Edge policy should be reviewed with the assumption that browser isolation can fail and surrounding controls must reduce the blast radius.
  • AI coding agents, local inference tools, and developer platforms should be inventoried and governed as privileged software, not experimental toys.
  • Security teams should expect a busier vulnerability pipeline as AI-assisted research increases the speed and volume of credible bug discovery.
  • The 90-day disclosure window should be used for preparation, because waiting for public technical details is the wrong side of the timeline.
Pwn2Own Berlin 2026 did not prove that Windows 11, Exchange, Edge, or AI development tools are uniquely doomed; it proved that the modern enterprise stack is too interconnected for any vendor’s assurances to stand alone. The next phase belongs to Microsoft and the other affected vendors as they turn private reports into patches, but the more important work starts now inside customer environments: inventory what exists, reduce what is exposed, constrain what is privileged, and assume the next exploit chain is already being assembled.

Source: Notebookcheck Pwn2Own Berlin 2026 - Windows 11 and Microsoft Exchange hacked
 

Back
Top