KB5096139 for Windows 11 26H1 Updates NVIDIA TensorRT-RTX Execution Provider

Microsoft has published KB5096139, an automatic Windows Update package for Windows 11 version 26H1 that updates the NVIDIA TensorRT-RTX Execution Provider to version 2.2605.1.0 on supported devices with the latest cumulative update installed. The package looks small, but it sits inside a much larger shift in how Windows will deliver local AI acceleration. Microsoft is turning GPU inference into a serviced platform component, not an app-by-app dependency hunt. For Windows users and administrators, that is both the promise and the risk.

Diagram shows an AI inference pipeline on an RTX GPU, from input to ONNX execution output on Windows.Microsoft Moves AI Acceleration Into the Plumbing​

KB5096139 is not a flashy feature drop. There is no new Copilot button, no consumer-facing app, and no benchmark table in the support note. The update is described simply as an improvement to the NVIDIA TensorRT-RTX Execution Provider component for Windows 11 version 26H1.
That understatement is the point. Windows ML execution providers are becoming part of the operating system’s maintenance layer, much like graphics drivers, codec packs, WebView runtimes, and servicing stack components. The user may never launch something called “TensorRT-RTX,” but an app running an ONNX model locally may depend on it to make inference fast enough to feel native.
The execution provider is the piece that lets ONNX Runtime and Windows ML route compatible machine-learning workloads to NVIDIA RTX hardware. Instead of each application shipping its own NVIDIA-specific runtime stack, Microsoft’s model increasingly asks Windows to keep a shared provider present, patched, and ready.
That is a platform bet. Microsoft is not merely adding AI features to Windows; it is trying to make Windows the broker between app developers and a fragmented landscape of NPUs, GPUs, runtimes, drivers, and model formats.

The Quiet KB Article Is Louder Than It Looks​

The official KB language is sparse: the NVIDIA TensorRT-RTX Execution Provider accelerates ONNX model inference on NVIDIA RTX GPUs, uses NVIDIA’s TensorRT for RTX runtime, and is aimed at client-centric end-user PC scenarios. The update downloads and installs automatically through Windows Update. To verify installation, users can check Windows Update history for “Windows ML Runtime Nvidia TensorRT-RTX Execution Provider Update (KB5096139).”
There are two important constraints in that dry prose. First, KB5096139 applies to Windows 11 version 26H1, all editions. Second, Microsoft says the device must already have the latest cumulative update for Windows 11 version 26H1 installed.
That makes this a servicing dependency rather than a standalone installer story. If a machine is not current on the base OS, it should not expect this AI component update to arrive independently. Microsoft is keeping the AI runtime layer tied to the monthly health of Windows itself.
KB5096139 also replaces KB5089174, the prior NVIDIA TensorRT-RTX Execution Provider update, which carried version 2.2604.1.0. The new version number, 2.2605.1.0, strongly suggests a cadence aligned with the calendar: April to May, package to package, with Windows Update doing the distribution work.
That cadence matters because AI runtimes are not static. Model operators change, hardware support matures, driver assumptions shift, and vendors chase performance regressions as aggressively as they chase headline gains. Microsoft appears to be preparing Windows for a world in which AI components update more like browser engines than classic OS features.

Windows ML Is Becoming a Vendor-Neutral Front Door​

The larger Windows ML strategy is straightforward: developers bring ONNX models, Windows discovers available acceleration hardware, and execution providers handle the vendor-specific path. In practice, that means NVIDIA, AMD, Intel, and Qualcomm each need their own optimized route through the same Windows-facing framework.
For years, Windows machine-learning deployment was a choose-your-own-adventure mess. A developer might target DirectML for broad GPU support, CUDA or TensorRT for NVIDIA performance, OpenVINO for Intel, QNN for Qualcomm, or vendor SDKs for the latest hardware-specific capabilities. That could work for specialists, but it was painful for mainstream app developers trying to ship a feature across consumer PCs.
Execution providers are Microsoft’s abstraction layer over that chaos. They do not erase the differences between an RTX GPU, a Ryzen AI NPU, a Core Ultra NPU, and a Snapdragon NPU. They try to make those differences consumable through a manageable API surface and a serviced distribution channel.
That is why KB5096139 should interest more than AI hobbyists. If Windows ML succeeds, the update history page becomes part of the AI compatibility story. A user’s ability to run local inference efficiently may depend not only on whether they bought the right silicon, but whether Windows has the right provider version installed.
This is also why Microsoft’s “all editions” language is notable. The company is not presenting TensorRT-RTX as a workstation-only or developer-only capability. It is being treated as an operating-system component for client PCs that happen to have the right NVIDIA hardware.

NVIDIA Gets a First-Class Lane on Windows​

NVIDIA does not need Microsoft to make AI matter on PCs. CUDA, TensorRT, GeForce drivers, Studio drivers, and the broader RTX software stack already give NVIDIA a deep software moat. But Windows ML gives NVIDIA something different: a first-class route into ordinary Windows applications without requiring every app vendor to become an NVIDIA runtime distributor.
TensorRT for RTX is designed for optimized inference on RTX-class hardware. The Windows execution provider wraps that advantage in a system-managed component. For users, the best version of this future is boring: install Windows updates, install GPU drivers, run an app, and the model accelerates locally without a trip through dependency hell.
For developers, it changes packaging decisions. Instead of bundling a large vendor stack, an app can query Windows ML for compatible execution providers and let Windows acquire or register them. That is not just a convenience; it reduces app size, update burden, and the risk that five different applications ship five different builds of the same acceleration layer.
There is still a boundary here. This update does not magically make every ONNX model faster, and it does not turn older non-RTX GPUs into modern AI accelerators. Hardware support, driver versions, model structure, operator coverage, precision choices, and memory limits all remain relevant.
But NVIDIA benefits from having its optimized path visible inside Microsoft’s sanctioned framework. In a Windows ecosystem where Copilot+ PCs have often been discussed through the lens of NPUs, KB5096139 is a reminder that discrete GPUs remain a huge part of the local AI story.

The NPU Narrative Was Always Too Narrow​

Microsoft’s Copilot+ PC messaging has made NPUs the fashionable unit of AI capability. TOPS figures became the new spec-sheet trophy, and Qualcomm’s early Copilot+ push made neural processors feel like the defining hardware of the AI PC era. That framing was useful for marketing, but technically incomplete.
A Windows PC is not one accelerator. It is a collection of compute options, each with trade-offs. CPUs remain useful for compatibility and smaller workloads. NPUs are attractive for power-efficient, always-on tasks. GPUs are still formidable for throughput-heavy inference, creative workloads, and models that benefit from the mature CUDA/TensorRT ecosystem.
KB5096139 sits in that broader reality. It is not an NPU update. It is a GPU execution-provider update for NVIDIA RTX hardware. That places high-end laptops, desktops, creator PCs, and gaming rigs back in the center of Windows AI deployment.
That matters because the installed base of RTX GPUs is enormous compared with the first waves of NPU-equipped Copilot+ PCs. If Microsoft wants developers to adopt local inference, it cannot afford to define the opportunity only around new machines bought in the last hardware cycle. RTX systems are already on desks, in dorm rooms, under monitors, and inside small studios.
The practical implication is that the “AI PC” is less a single certification badge than a moving software target. A machine’s AI usefulness will depend on the intersection of hardware, drivers, Windows version, runtime components, and application support. KB5096139 updates one lane in that intersection.

Automatic Updates Are a Feature Until They Break Something​

Microsoft says KB5096139 will be downloaded and installed automatically from Windows Update. For consumers, that is almost certainly the right default. AI acceleration should not require a scavenger hunt through vendor pages, GitHub releases, and redistributable packages.
For administrators, automatic servicing cuts both ways. Shared runtime components reduce drift when everything works. When something misbehaves, they also create a new class of change to monitor: not quite a driver, not quite an application, not quite a cumulative update, but still capable of changing application behavior.
Inference runtimes can affect performance, memory usage, model compatibility, startup latency, and crash patterns. A provider update that improves one model family may expose assumptions in another. An application that was tested against last month’s provider might behave differently after Windows Update advances the shared component.
That does not mean enterprises should fear KB5096139 specifically. Microsoft’s note does not list known issues, security fixes, or dramatic behavior changes. The concern is structural: Windows AI components are becoming active pieces of the endpoint stack, and endpoint teams will need to inventory them with the same seriousness they apply to graphics drivers and WebView runtimes.
The update history check Microsoft provides is useful but minimal. It tells a user whether the package is present. It does not tell an administrator whether a particular app has registered the provider, whether a given ONNX workload is using it, or whether fallback to DirectML or CPU occurred silently.
That gap will matter as local AI moves from demos to line-of-business workflows. When a user says “the AI feature got slower after Patch Tuesday,” the answer may be buried somewhere between Windows Update, GPU driver release notes, app telemetry, and execution-provider selection logic.

Developers Get Less Packaging Pain and More Platform Dependence​

The developer story is the most compelling part of Microsoft’s approach. Windows ML lets apps use execution-provider catalog APIs to discover, install, and register compatible providers. In theory, that gives developers a clean path to hardware acceleration without shipping every vendor’s SDK.
That is the right abstraction for a mass-market OS. Developers should not need to become experts in every accelerator stack just to add local image processing, text analysis, semantic search, or small-model inference. ONNX gives them a model format, ONNX Runtime gives them an execution engine, and Windows ML offers a bridge to hardware.
But abstraction is never free. The more developers depend on Windows-managed execution providers, the more they depend on Microsoft’s release cadence, certification process, and compatibility promises. A vendor SDK bundled with an app gives the developer more control. A Windows-serviced provider gives the developer less packaging work but more reliance on the platform state of the user’s PC.
That trade-off will be acceptable for many applications, especially consumer software where smooth installation matters more than absolute runtime determinism. It may be less acceptable for regulated, validated, or production-critical workflows where a runtime change must be tested before deployment.
The likely outcome is a split market. Mainstream Windows apps will lean on Windows ML’s shared providers. Specialist tools, research stacks, and enterprise-controlled deployments may continue to bundle or pin their own acceleration libraries. Microsoft is not eliminating the old path; it is making the default path more attractive.
KB5096139 is one brick in that path. It tells developers that NVIDIA’s Windows ML lane is being serviced, versioned, and replaced through Microsoft’s normal update machinery.

Version 26H1 Makes the Timing More Interesting​

The KB applies to Windows 11 version 26H1, which places it in Microsoft’s next feature-release era rather than the broadly deployed Windows 11 builds most users are running today. That matters because Windows AI infrastructure is advancing alongside OS version boundaries.
Microsoft has increasingly separated parts of Windows into independently serviced components, but major framework changes still cluster around platform releases. If 26H1 is where this TensorRT-RTX provider update lands, it suggests Microsoft is using the upcoming Windows branch to refine the AI runtime architecture before it becomes mundane.
The version number 2.2605.1.0 also hints at a maturing monthly rhythm. KB5089174 carried 2.2604.1.0. KB5096139 carries 2.2605.1.0. That is not proof of a guaranteed monthly release schedule, but it is evidence of a component that can move faster than old-school Windows feature delivery.
For WindowsForum readers, the practical question is not whether this update will transform a PC overnight. It will not. The better question is whether Windows is becoming a rolling AI substrate underneath the familiar desktop.
The answer increasingly looks like yes. Microsoft’s AI work is no longer confined to visible features like Recall, Cocreator, Copilot, or semantic search. It is showing up in support articles for execution providers, model components, and runtime updates most users will never knowingly interact with.

Local AI Needs Trust, Not Just Throughput​

The local AI pitch has three big selling points: lower latency, reduced cloud cost, and better privacy because data can stay on the device. Windows ML aligns neatly with all three. If an app can run an ONNX model locally on an RTX GPU, it may avoid sending data to a remote inference service.
But local execution does not automatically create trust. Users and administrators still need to know what model is running, what data it consumes, whether outputs are stored, and how the runtime is updated. A serviced execution provider solves acceleration, not governance.
That distinction will become more important as AI features spread into productivity suites, creative tools, search utilities, developer environments, and enterprise apps. A local model running quickly is still a model whose behavior must be understood. A GPU-accelerated inference path is still software that can have bugs, vulnerabilities, and compatibility quirks.
Microsoft’s componentized update strategy also creates a documentation burden. If AI runtime components are updated through KB articles, those articles need to mature beyond “includes improvements.” That phrase may be acceptable for a small internal component, but it is thin gruel for administrators trying to assess deployment risk.
To be fair, Microsoft is not alone here. GPU vendors and ML framework maintainers often bury critical behavioral differences in release notes that only specialists read. The difference is that Windows Update reaches a much broader population. When Microsoft becomes the distributor, Microsoft inherits the expectation of Windows-grade transparency.

The Real Competition Is the Default Path​

The strategic fight here is not just NVIDIA versus AMD versus Intel versus Qualcomm. It is also Windows-managed AI versus app-managed AI. Whoever owns the default path for local inference will shape developer habits.
If Windows ML works well, developers will reach for it because it reduces friction. If it is inconsistent, opaque, or slow to support new model patterns, developers will keep bundling vendor-specific stacks. Microsoft’s job is to make the boring path good enough that most applications do not need a bespoke one.
NVIDIA has an advantage because its developer ecosystem is already strong. TensorRT is familiar in inference circles, and RTX branding gives consumers an intuitive sense that their GPU should help with AI workloads. A Windows execution provider lets NVIDIA’s strengths show up without requiring the user to install a separate AI runtime by hand.
AMD, Intel, and Qualcomm will push their own acceleration stories through the same Windows ML framework. That is healthy if the abstraction holds. It is dangerous if “works on Windows ML” fragments into a matrix of partially supported models, provider-specific caveats, and silent fallbacks.
The burden falls on Microsoft to make provider selection understandable. Developers need diagnostics. Administrators need inventory. Users need confidence that an app is not merely claiming local acceleration while quietly running on the CPU because the correct provider was missing, outdated, or incompatible.
KB5096139 does not answer all of that. It does show that Microsoft is investing in the mechanism that would make those answers possible.

The Small Update That Shows Where Windows Is Going​

KB5096139 is not a marquee Windows release, but it gives a clear read on Microsoft’s direction for local AI on PCs.
  • Windows 11 version 26H1 is getting an automatic NVIDIA TensorRT-RTX Execution Provider update through Windows Update.
  • The package updates the provider to version 2.2605.1.0 and replaces the earlier KB5089174 release.
  • The update requires the latest cumulative update for Windows 11 version 26H1 before installation.
  • Users can confirm installation in Windows Update history under the Windows ML Runtime NVIDIA TensorRT-RTX Execution Provider entry.
  • The broader significance is that Microsoft is treating AI acceleration runtimes as serviced Windows components rather than optional developer baggage.
  • The practical risk is that administrators will need better visibility into AI component versions, provider selection, and app behavior as local inference becomes common.
KB5096139 will not make every RTX PC feel suddenly transformed, and most users will never know it arrived. But updates like this are how platforms change: first as obscure runtime packages, then as developer assumptions, and eventually as the invisible layer beneath everyday features. Microsoft is laying the tracks for Windows to become the default broker of local AI acceleration, and the next test is whether it can make that layer reliable, transparent, and boring enough for everyone to stop thinking about it.

References​

  1. Primary source: Microsoft Support
    Published: Tue, 26 May 2026 21:02:40 Z
  2. Official source: learn.microsoft.com
  3. Related coverage: docs.nvidia.com
  4. Related coverage: windowsforum.com
  5. Related coverage: docs.nvidia.cn
 

Back
Top