KB5096135 Update: Qualcomm QNN Execution Provider for Windows 11 Arm AI

Microsoft published KB5096135 on May 26, 2026, as an automatic Windows Update package that updates the Qualcomm QNN Execution Provider AI component to version 2.2605.2.0 for Windows 11 version 24H2 and Windows 11 version 25H2. The update is narrow, quiet, and easy to miss, but it is also a useful marker for where Windows on Arm is heading. Microsoft is increasingly treating local AI acceleration not as a flashy app feature, but as serviced operating-system plumbing.

Futuristic laptop with on-device NPU AI acceleration graphics and Windows update screen.Microsoft Turns the NPU Into a Serviced Windows Dependency​

KB5096135 is not the kind of Windows update that will make most users stop what they are doing. It does not promise a new Start menu, a visible Copilot redesign, or a performance boost that can be captured in a single benchmark headline. Microsoft’s support note describes it simply as an update to the Qualcomm QNN Execution Provider, an ONNX Runtime execution provider used by Windows machine learning scenarios on Qualcomm chipsets.
That phrasing is dry, but the architecture matters. ONNX Runtime is one of the layers Microsoft uses to run machine-learning models across different hardware backends. An execution provider is the adapter that lets ONNX Runtime send work to a particular acceleration path rather than treating every model as a CPU job. In this case, the target is Qualcomm’s AI stack, exposed through the Qualcomm AI Engine Direct SDK and its QNN graph execution model.
The update applies to Windows 11 version 24H2 and Windows 11 version 25H2, and Microsoft says it will be downloaded and installed automatically through Windows Update. It requires the latest cumulative update for those Windows versions first, which means Microsoft is tying the AI component cadence to the broader servicing baseline rather than leaving it entirely to app installers, SDK packages, or OEM utilities.
That is the real story. The Copilot+ PC pitch depends on NPUs being available, predictable, and usable by system components and third-party applications. KB5096135 is one of the small gears in that machine: a component update that keeps Qualcomm’s NPU acceleration path in step with Windows as the platform underneath it changes.

The Quiet Update Says More Than the Support Page Admits​

Microsoft’s support article does not list individual fixes, new operators, performance claims, or known issues. It says the package includes improvements to the Qualcomm QNN Execution Provider AI component, replaces the previously released KB5089617, and appears in update history as “Windows ML Runtime Qualcomm QNN Execution Provider Update (KB5096135).” That is not much for administrators hoping to assess regression risk.
But the absence of detail is itself instructive. Microsoft is presenting this as component servicing, not as a feature release. The QNN Execution Provider is being handled more like a graphics runtime, media codec, or Defender intelligence component than a conventional application update. It arrives when eligible devices check Windows Update; users can verify it under Settings, Windows Update, and Update history.
The choice to split this out as a named KB also matters. AI components are no longer invisible blobs inside a monolithic OS image. Microsoft has been building separate release information for AI components, and KB5096135 fits that model: specific enough to track, automatic enough that most people never need to think about it, and dependent enough on cumulative updates that it remains anchored to the Windows servicing train.
For Windows enthusiasts, this is a subtle but important shift. The NPU is becoming part of the Windows compatibility surface. If an app depends on ONNX Runtime and expects Qualcomm acceleration, it is no longer depending only on the silicon vendor’s marketing sheet; it is depending on whether the right provider, runtime, and OS update level are present.

Qualcomm’s Execution Provider Is the Translation Layer Windows Needs​

The Qualcomm QNN Execution Provider exists because “run this AI model on the NPU” is not a single operation. An ONNX model must be mapped into a graph that Qualcomm’s backend understands, and that graph must be executed through the appropriate accelerator library. Microsoft’s description says the provider uses Qualcomm’s QNN SDK to construct a QNN graph from an ONNX model, which is then executed by a supported accelerator backend.
In practical terms, this is the layer that lets a Windows machine-learning workload use Qualcomm hardware acceleration without every application developer hand-writing against the lowest-level Qualcomm interfaces. That does not make performance automatic. Models still need to be compatible, quantized appropriately for certain backends, and shaped in ways the provider can handle.
The ONNX Runtime documentation for QNN makes that clear. QNN can target different Qualcomm backends, including HTP for NPU offload, GPU, and CPU-style reference paths, and the HTP route has model requirements that developers cannot ignore. Quantized models, fixed shapes, supported operators, and careful fallback behavior remain part of the engineering burden.
This is why a servicing update is not just routine maintenance. As Microsoft, Qualcomm, and developers expand the set of models expected to run locally, the execution provider becomes a moving compatibility layer. Improvements may mean better operator support, updated backend behavior, reliability changes, performance tuning, or alignment with newer Qualcomm runtime pieces. Microsoft has not specified which of those are in KB5096135, so we should not pretend to know. What we can say is that this is the place such improvements would land.

Windows on Arm Needs Boring Infrastructure More Than Big Promises​

The Snapdragon X generation gave Windows on Arm its first truly mainstream hardware moment. The marketing hook was battery life and AI acceleration, but the success of the platform depends on far less glamorous work. Drivers need to be stable, emulation needs to be good enough, native apps need to appear, and local AI runtimes need to behave consistently across OEM images.
KB5096135 sits squarely in that last bucket. Qualcomm’s NPU is only useful to Windows users if the software above it knows how to target it. For Microsoft’s own features, that means Windows ML and related runtime paths need to stay current. For developers, it means the ONNX Runtime provider must be reliable enough that using the NPU does not turn into a support matrix nightmare.
This is where the platform argument becomes sharper. Microsoft is not just competing with Intel and AMD PCs on performance per watt; it is competing with Apple’s control over the full hardware-software stack. Apple can evolve Core ML, Neural Engine support, and OS frameworks as a single platform story. Windows has to coordinate Microsoft, Qualcomm, OEMs, app developers, and enterprise deployment tools.
Servicing the QNN Execution Provider through Windows Update is one way Microsoft narrows that gap. It gives the company a route to improve the AI acceleration layer after a device ships. It also reduces the chance that every OEM image freezes a different version of a key AI component in place.

Automatic Installation Helps Consumers and Complicates Change Control​

For consumers, automatic installation is the right default. Nobody wants to hunt down a runtime package to make a photo effect, voice feature, local model, or future Copilot workload use the NPU properly. If the device is supported and up to date, the acceleration layer should be there.
For enterprise IT, automatic installation is more complicated. AI component updates can affect workloads that are difficult to test with traditional application compatibility suites. A regression in a runtime provider might not break Windows boot or Office launch; it might change latency, accuracy, fallback behavior, power consumption, or memory pressure in an application that depends on local inference.
Microsoft’s prerequisite requirement at least offers a clear baseline: devices need the latest cumulative update for Windows 11 version 24H2 or version 25H2. That helps administrators reason about eligibility. The update history entry gives them a way to verify presence after deployment.
Still, the support note’s lack of a detailed changelog leaves a gap. If an organization is piloting Windows on Arm devices for field workers, developers, executives, or AI-assisted workflows, it needs to understand what changed in the acceleration stack. “Includes improvements” may be true, but it is not sufficient for environments where reproducibility matters.

The 24H2 and 25H2 Targeting Shows the AI Baseline Moving Forward​

KB5096135 applies to Windows 11 version 24H2 and Windows 11 version 25H2, not older Windows 11 releases. That targeting is unsurprising, but it matters. Microsoft’s modern AI plumbing is concentrated around the newer Windows 11 codebase, particularly the builds aligned with Copilot+ PC hardware.
Windows 11 24H2 was the release that brought the first wave of Copilot+ PC infrastructure into the mainstream channel. Windows 11 25H2, referenced in the KB as supported, continues that servicing trajectory. By limiting this QNN provider update to those branches, Microsoft is implicitly defining where the supported AI component stack lives.
That has consequences for device fleets. A Qualcomm-powered Windows PC that is not on a supported release branch may not receive the same AI runtime servicing path. Conversely, a device on 24H2 or 25H2 with current cumulative updates should be positioned to receive these component-level updates automatically.
This is increasingly how Windows will draw the line between “runs Windows” and “runs the current Windows AI platform.” The OS version, cumulative update level, AI component version, and silicon vendor provider all become part of the same compatibility story. The old habit of asking only which Windows build is installed will not be enough.

The Replacement of KB5089617 Signals a Cadence, Not a One-Off​

Microsoft says KB5096135 replaces KB5089617. That replacement detail is easy to skip, but it is one of the more revealing facts in the support article. The QNN provider update stream is not a one-off patch; it is a sequence.
A replacement chain suggests Microsoft expects this component to evolve independently enough that each update needs its own identity. That is sensible. AI runtimes are changing quickly, and the underlying models, operator sets, quantization strategies, profiling behavior, and acceleration APIs are not static. Qualcomm’s own stack evolves, ONNX Runtime evolves, and Windows’ use of local inference evolves.
The challenge is that component servicing at this pace can strain the documentation model. If Microsoft ships frequent AI runtime updates but describes each only as “improvements,” administrators and developers will have to infer too much. They will test, compare, and reverse-engineer behavior that should ideally be described in release notes.
The upside is agility. Windows on Arm needs fast iteration if it is to mature as an AI client platform. The downside is opacity. KB5096135 shows both sides of that bargain.

Developers Get a Better Target, but Not a Free Pass​

For developers building on ONNX Runtime, the QNN Execution Provider offers a path to Qualcomm acceleration without abandoning cross-platform model workflows. The same high-level runtime can target different execution providers, with QNN handling the Qualcomm path. That is attractive if your application needs to run on a mix of Intel, AMD, Nvidia, and Qualcomm systems.
But acceleration remains conditional. The QNN HTP backend, which is the interesting NPU path, has real constraints. Models may need quantization, dynamic shapes may need to be fixed, unsupported operators may fall back to CPU unless developers disable fallback for validation, and provider options can affect performance and behavior.
KB5096135 does not change that fundamental contract. It may improve the provider, but it does not turn every ONNX model into an NPU-ready workload. Developers still need to test on actual Snapdragon Windows hardware and check whether their model runs where they think it runs.
That last point is crucial. Silent CPU fallback can make an application appear compatible while quietly losing the power and latency benefits the developer expected. For serious local AI workloads, validation should include provider placement, performance counters, power behavior, and error handling—not just successful inference.

Users Will See the Effects Indirectly, If They See Them at All​

Most users will never knowingly interact with the Qualcomm QNN Execution Provider. They will not launch it, configure it, or recognize its update history entry unless they are looking for it. The effects, if any are visible, will appear through applications and Windows features that use local machine learning.
That could mean faster response times, lower CPU usage, better battery behavior, improved reliability, or support for workloads that previously failed to use the NPU properly. It could also mean no visible change at all. Microsoft has not claimed user-facing performance improvements for KB5096135, and we should not invent them.
This is the nature of platform work. The more successful the provider is, the less users should need to know it exists. A camera feature, accessibility feature, creative application, or local assistant should simply choose the right acceleration path and run.
The danger is that invisible infrastructure is hard to troubleshoot. If a feature behaves differently after an update, the relevant change may be buried under AI component servicing rather than in the app itself. Power users should know where to look: Settings, Windows Update, Update history, and the entry for the Windows ML Runtime Qualcomm QNN Execution Provider Update.

IT Pros Should Treat AI Components Like Drivers With Runtime Semantics​

Administrators already understand driver updates as both necessary and risky. AI component updates deserve similar treatment, but with a twist. They are not just hardware enablement packages; they are runtime behavior packages that can affect application execution.
A graphics driver regression is often visible quickly: crashes, flicker, broken rendering, poor frame rates. An AI runtime regression may be subtler. A local inference workload may become slower, consume more power, fall back to CPU, or produce different numerical behavior. In regulated or highly controlled environments, even small changes in model execution paths can matter.
That does not mean organizations should block KB5096135 by reflex. The update is part of Microsoft’s supported servicing path for the relevant Windows versions. Avoiding it indefinitely may leave devices behind the platform baseline that Microsoft and app vendors expect.
It does mean pilots matter. Qualcomm-based Windows 11 fleets should include representative AI workloads in update validation, even if those workloads are currently modest. The future risk is not only today’s Copilot feature; it is tomorrow’s line-of-business app that quietly depends on local inference.

Microsoft Is Building an AI Update Channel in Plain Sight​

The most interesting part of KB5096135 is not the version number. It is the emerging pattern around AI component release information, update history entries, and Windows Update delivery. Microsoft is building a servicing channel for AI plumbing that sits somewhere between OS feature updates and app-store-delivered experiences.
This makes strategic sense. AI features are moving faster than Windows feature releases. Hardware providers are iterating quickly. Developers need runtime fixes without waiting for annual OS milestones. Users, meanwhile, expect their expensive NPU-equipped laptops to get better over time.
The question is whether Microsoft can make this channel transparent enough. Windows Update has long been criticized when it changes too much with too little explanation. AI servicing raises the stakes because the stack is less familiar to many administrators and users. A KB page that says “improvements” may be acceptable for a minor component, but it becomes less acceptable as more critical workloads depend on that component.
Microsoft’s opportunity is to make AI component servicing boring in the best sense: predictable, documented, reversible where appropriate, and visible in management tools. KB5096135 is a step toward the predictable part. The documentation part still has room to grow.

The Small Qualcomm Patch Carries a Bigger Windows Lesson​

KB5096135 is best understood as a platform-maintenance update, not a feature drop. Its importance is less about what Microsoft explicitly says it changes and more about what its delivery model reveals.
  • KB5096135 updates the Qualcomm QNN Execution Provider AI component to version 2.2605.2.0 for Windows 11 version 24H2 and Windows 11 version 25H2.
  • The update installs automatically through Windows Update after the latest cumulative update for the supported Windows version is present.
  • The update replaces KB5089617, showing that Microsoft is maintaining a sequence of AI component updates rather than shipping a one-time package.
  • Users can verify installation in Windows Update history under the Windows ML Runtime Qualcomm QNN Execution Provider Update entry.
  • Developers and IT teams should treat the QNN provider as a serviced compatibility layer for local AI workloads on Qualcomm hardware, not as a static OEM component.
  • Microsoft has not published a detailed fix list for this KB, so performance or behavior changes should be validated rather than assumed.
KB5096135 will not sell anyone a Copilot+ PC by itself, and it will not settle the Windows on Arm argument overnight. But it shows Microsoft doing the unglamorous work required for that argument to become credible: turning AI acceleration into updateable Windows infrastructure. The next phase of the PC will depend less on whether an NPU exists on the spec sheet and more on whether Windows can keep the layers above it current, observable, and dependable.

References​

  1. Primary source: Microsoft Support
    Published: Tue, 26 May 2026 21:02:30 Z
 

Back
Top