KB5089168 Updates Windows ML TensorRT-RTX Provider for RTX on 24H2 and 25H2

ChatGPT · 2026-04-28T18:05:11-0400

Microsoft has published KB5089168, a Windows Update-delivered refresh for the NVIDIA TensorRT-RTX Execution Provider, moving the component to version 2.2604.1.0 for Windows 11 version 24H2 and Windows 11 version 25H2 systems. The update targets one of the most important but least visible layers in Microsoft’s on-device AI stack: the execution provider that lets ONNX Runtime and Windows ML accelerate supported AI models on NVIDIA RTX GPUs. For users, it should arrive automatically through Windows Update; for developers, it signals Microsoft’s continued shift toward modular, hardware-aware AI components that can be updated outside full app releases or major Windows feature upgrades.

Background

The NVIDIA TensorRT-RTX Execution Provider sits at the intersection of Windows ML, ONNX Runtime, and NVIDIA’s RTX GPU ecosystem. In practical terms, it allows Windows applications to run supported ONNX machine-learning models on local NVIDIA RTX hardware rather than relying only on the CPU, a generic GPU path, or cloud inference.
That distinction matters because Windows is increasingly becoming an AI runtime platform, not merely an operating system that hosts AI applications. Microsoft’s newer Windows ML strategy uses execution providers as plug-in acceleration layers, allowing apps to target a common inference framework while the system selects an appropriate backend for the installed hardware.
Historically, Windows AI acceleration has leaned on DirectML, a broad hardware abstraction layer that works across GPU vendors. DirectML remains important, but vendor-tuned execution providers can expose more specific optimizations, especially when they are built around mature inference engines such as TensorRT.
NVIDIA’s TensorRT heritage comes from datacenter and workstation inference, where graph optimization, kernel selection, quantization, and hardware-specific scheduling have long been central to performance. TensorRT for RTX adapts that idea to client PCs, focusing on smaller deployment footprint, faster engine creation, and portability across consumer RTX systems.
KB5089168 is therefore more than a routine component bump. It is another sign that Windows 11’s AI plumbing is being serviced like graphics drivers, codecs, and security components: quietly, incrementally, and through the same update mechanisms users already know.

What KB5089168 Actually Delivers

KB5089168 updates the Windows ML Runtime NVIDIA TensorRT-RTX Execution Provider component for Windows 11 24H2 and 25H2. Microsoft describes the release as containing improvements to the execution provider component, rather than as a feature update with a detailed consumer-facing changelog.

Installation and visibility

The update is distributed automatically through Windows Update. Microsoft says the device must already have the latest cumulative update for Windows 11 version 24H2 or 25H2 installed before KB5089168 applies.
Users who want to confirm installation can check Settings > Windows Update > Update history. After successful installation, the relevant entry should appear as Windows ML Runtime Nvidia TensorRT-RTX Execution Provider Update (KB5089168).
Key practical details include:

Applies to Windows 11 version 24H2 and 25H2
Updates the NVIDIA TensorRT-RTX Execution Provider to version 2.2604.1.0
Requires the latest cumulative update for the installed Windows release
Installs automatically through Windows Update
Replaces the earlier KB5083460 package
Appears in Windows Update history after installation

The replacement detail is important for administrators and enthusiasts tracking component lineage. If KB5083460 was the previous package in this servicing branch, KB5089168 becomes the newer baseline for supported 24H2 and 25H2 devices.

Why Execution Providers Matter

An execution provider is the part of ONNX Runtime that maps portions of a model graph to a specific compute backend. Instead of asking an app developer to write separate inference code for every GPU, NPU, or CPU target, ONNX Runtime can partition work and delegate supported operations to the most appropriate provider.

The abstraction layer beneath Windows AI

This approach gives Windows a cleaner separation between application logic and hardware acceleration. A developer can build against ONNX Runtime or Windows ML APIs, while the execution provider handles the lower-level realities of supported operators, device memory, graph compilation, and kernel execution.
That model is becoming essential because the PC ecosystem is no longer a simple CPU-and-GPU story. Modern Windows devices may include NPUs, integrated GPUs, discrete GPUs, and specialized tensor hardware, each with different strengths.
For Windows users, the benefit is indirect but meaningful. When the execution provider improves, compatible apps can see better inference behavior without needing a full application rewrite.
The basic execution-provider workflow can be understood in four steps:

The app loads an ONNX model through Windows ML or ONNX Runtime.
The runtime identifies available execution providers on the device.
Supported parts of the model graph are assigned to the best provider.
Unsupported operations fall back to another compatible backend where possible.

That fallback behavior matters because no accelerator supports every model pattern equally. A robust Windows AI stack must be fast when acceleration is available and dependable when it is not.

The RTX Angle: Why NVIDIA Gets a Specialized Path

NVIDIA RTX GPUs include Tensor Cores, driver-level AI optimizations, and a mature CUDA-adjacent software ecosystem. TensorRT-RTX is designed to exploit those capabilities for local inference workloads, especially where latency and throughput matter.

Client AI instead of datacenter AI

Traditional TensorRT has been associated with servers, edge appliances, and managed inference deployments. TensorRT-RTX narrows the focus to client-centric scenarios, where installation size, fast startup, and variable consumer hardware are just as important as peak performance.
That shift is important because a Windows PC is not a controlled datacenter node. It may be a gaming desktop, creator workstation, laptop, or compact PC, and the AI workload may run alongside games, video editing, browser tabs, and background services.
The client focus creates different priorities:

Fast model engine creation matters because users will not tolerate long first-run delays.
Small package footprint matters because apps already compete for storage and bandwidth.
Portable cached engines matter because users may update drivers or move between RTX systems.
Low latency matters because local AI features often sit inside interactive workflows.
Resource awareness matters because the GPU may already be busy rendering graphics.

This is where a Windows Update-delivered execution provider becomes strategically useful. Microsoft and NVIDIA can improve the acceleration layer centrally, while app developers retain a stable programming model.

Windows 11 24H2 and 25H2: The Servicing Context

KB5089168 is specifically aimed at Windows 11 version 24H2 and Windows 11 version 25H2, two releases that share much of the same modern Windows platform foundation. That matters because Microsoft has increasingly tied new AI infrastructure to Windows 11’s newer servicing baseline.

Why the prerequisite matters

The requirement for the latest cumulative update is not a throwaway line. AI execution providers depend on Windows ML plumbing, package deployment infrastructure, runtime registration, and compatibility checks that may themselves be updated through monthly Windows servicing.
If a system is behind on cumulative updates, it may lack the runtime behavior expected by the newer TensorRT-RTX package. That is why Microsoft gates the component behind the latest cumulative update rather than treating it like an isolated driver download.
For consumers, the practical message is simple: if the update does not appear, first verify that Windows Update is fully current. For managed enterprise devices, the answer may be more complicated because update rings, deferrals, and policy controls can delay component delivery.
There is also a broader servicing story here. Windows ML execution providers are increasingly treated as dynamic system components, not static pieces of a single Windows image.
That creates several operational consequences:

Feature improvements can arrive between annual Windows releases
AI acceleration can improve without app redeployment
Hardware vendors can align runtime updates with driver and SDK progress
IT administrators need visibility into AI component versions
Developers must account for devices having different provider versions
Troubleshooting now includes Windows Update state, not only app state

This is a more flexible model, but it also requires better documentation and diagnostics. Users may not know that an AI feature depends on an execution provider that arrived through Windows Update.

Developer Impact: Less Bundling, More Runtime Dependency

For developers building Windows AI applications, KB5089168 reinforces Microsoft’s preferred model: rely on system-provided, Windows-certified execution providers where practical. Instead of bundling every vendor backend into an app, developers can use Windows ML’s catalog-based approach to acquire and register compatible providers.

The tradeoff developers must manage

The advantage is obvious. App packages can be smaller, deployment is cleaner, and apps can benefit from execution-provider updates without shipping a new version.
The tradeoff is that developers now depend more heavily on the user’s Windows servicing state. If Windows Update is paused, blocked by policy, or waiting for a reboot, an app’s best acceleration path may be unavailable or outdated.
Developers should treat this as a design constraint, not an edge case. A well-built Windows AI app should detect available providers, communicate acceleration status clearly, and fall back gracefully.
Recommended developer practices include:

Check provider availability at runtime
Avoid assuming the TensorRT-RTX provider is present on every RTX PC
Expose useful diagnostics for support teams
Cache compiled assets where supported and appropriate
Test both accelerated and fallback paths
Avoid hard failures when a provider update is pending
Document expected Windows and driver requirements for users

The biggest opportunity is cleaner cross-vendor code. If Microsoft’s execution-provider catalog matures as intended, developers can target a common Windows ML path while still benefiting from NVIDIA, Intel, AMD, and Qualcomm acceleration.
That is the kind of abstraction Windows needs if local AI is to become a mainstream app capability rather than a fragmented vendor-specific feature.

Consumer Impact: Faster Local AI, Mostly Invisible Updates

Most Windows users will never manually install an execution provider, and that is probably the point. KB5089168 is designed to appear through Windows Update, sit quietly in update history, and improve the local AI path for apps that know how to use it.

What users may notice

The most likely user-facing benefits are indirect. A compatible photo editor, video tool, communication app, local assistant, or creative utility may load an AI feature faster, run it with lower latency, or use the GPU more efficiently after the component is updated.
Users should not expect KB5089168 to change gaming frame rates or general desktop performance. This is not a GeForce display driver, a DirectX update, or a broad graphics fix; it is an inference-runtime component for supported machine-learning workloads.
The update is most relevant to users with:

NVIDIA GeForce RTX 30-series or newer hardware
RTX workstation-class GPUs
Windows 11 24H2 or 25H2
Apps that use ONNX Runtime or Windows ML
Local AI features that support GPU acceleration
Current NVIDIA drivers and current Windows cumulative updates

The update may be especially interesting for creator PCs. AI-assisted denoising, upscaling, segmentation, transcription, effects processing, and image generation workflows can benefit from lower-latency local inference when the app stack is designed around the right APIs.
For everyday users, the most important point is that this update does not require manual action unless something goes wrong. If Windows Update is healthy, the component should arrive automatically on eligible systems.

Enterprise Impact: Control, Compliance, and Fleet Visibility

In enterprise environments, KB5089168 raises a different set of questions. IT teams care less about whether an AI demo runs slightly faster and more about whether new runtime components are controlled, supportable, and visible across managed fleets.

AI components become operational assets

The execution provider model means AI acceleration is no longer only an application dependency. It is also a Windows component dependency, delivered through the same infrastructure that handles many other platform updates.
That can be a strength. Enterprises already understand Windows Update for Business, update rings, deferral policies, compliance reporting, and phased deployment.
It can also be a concern. If a business application depends on a particular inference path, an execution-provider update could theoretically change performance behavior, compatibility, or troubleshooting patterns.
Enterprise administrators should evaluate KB5089168 through several lenses:

Which managed devices are eligible for the update
Whether Windows Update policy allows execution-provider downloads
Whether affected apps use Windows ML or ONNX Runtime acceleration
How to identify installed provider versions across the fleet
Whether pilot rings should validate AI workloads before broad rollout
How support desks should interpret Windows ML update history entries

This is not a reason to block the update by default. It is a reason to treat AI runtime components as part of the Windows platform baseline.
The enterprise advantage is that Microsoft’s model can reduce app-level packaging complexity. A software vendor can rely on Windows-certified components instead of bundling large hardware-specific packages, which may simplify deployment and reduce duplicated dependencies.

Competitive Implications: Microsoft’s AI Hardware Neutrality Test

KB5089168 is an NVIDIA-specific update, but the bigger story is Microsoft’s multi-vendor AI platform strategy. Windows ML is positioned as a layer that can work across CPUs, GPUs, and NPUs from multiple silicon vendors.

NVIDIA’s advantage and Microsoft’s balancing act

NVIDIA enters this model with a strong software advantage. TensorRT is well known in inference circles, RTX GPUs are common in gaming and creator PCs, and NVIDIA has spent years building developer mindshare around accelerated AI workloads.
At the same time, Microsoft cannot afford to make Windows AI feel like an NVIDIA-only story. Copilot+ PCs, Qualcomm NPUs, AMD Ryzen AI systems, Intel Core Ultra devices, and future accelerators all need a credible path into the Windows ML ecosystem.
That creates a delicate balance:

NVIDIA benefits from a high-performance RTX-specific provider
Microsoft benefits from making Windows the common AI runtime layer
AMD and Intel need equally reliable provider servicing
Qualcomm needs strong NPU support for Arm-based Windows devices
Developers need one coherent model rather than vendor-specific chaos

The competition is not only about raw TOPS or benchmark slides. It is about which hardware can be accessed reliably by mainstream Windows apps with minimal developer friction.
If Microsoft gets this right, Windows becomes a more attractive platform for local AI development. If it gets it wrong, developers may retreat to vendor SDKs, browser-based inference, or cloud APIs.

Technical Considerations: Compilation, Caching, and Fallback

TensorRT-RTX is not just a switch that says “use GPU.” It involves compiling model graphs into optimized inference engines, selecting kernels, managing shapes, and sometimes caching provider-specific representations for faster subsequent startup.

Why first-run behavior matters

Local AI workloads often fail the user-experience test before they fail the benchmark test. If an app takes too long to initialize an AI model, users may assume the feature is broken even if later inference would be fast.
TensorRT-RTX’s client-oriented design tries to reduce that pain by emphasizing faster build times and smaller deployment size. The Windows ML integration extends that idea by keeping the provider outside the app package and letting Windows handle compatible updates.
Still, developers need to plan for variability. Model size, supported operators, GPU architecture, driver state, and system load can all influence the experience.
Important technical concerns include:

Dynamic input shapes may complicate optimization
Unsupported operators may force partial fallback
Engine caches may need invalidation after runtime or driver changes
GPU memory pressure can affect reliability
First-run compilation may still be visible for larger models
Background AI tasks must coexist with graphics workloads
Provider updates may alter performance characteristics

This is why diagnostics matter. Apps should not merely report that “GPU acceleration failed”; they should indicate whether the provider was missing, incompatible, outdated, or unable to handle the model graph.
For power users, this also means update history is now part of AI troubleshooting. A local inference issue may involve the app, the model, the NVIDIA driver, Windows ML, ONNX Runtime, or the execution provider package.

How to Check and Troubleshoot KB5089168

For most users, KB5089168 should be automatic. Still, enthusiasts and administrators may want to confirm whether the update is installed, especially if they are testing Windows ML applications or comparing inference behavior before and after the update.

Practical verification steps

The easiest method is through Windows Settings. Microsoft specifically points users to Windows Update history, where the component should appear after installation.
A sensible troubleshooting sequence looks like this:

Open Settings.
Go to Windows Update.
Install any pending cumulative updates.
Restart the PC if Windows asks for it.
Return to Windows Update and check again.
Open Update history.
Look for Windows ML Runtime Nvidia TensorRT-RTX Execution Provider Update (KB5089168).

If the entry is missing, the system may not be eligible, may not yet have received the update, or may be blocked by policy. On managed PCs, enterprise controls can restrict dynamic component downloads even when the base Windows version is supported.
Users should also verify the basics:

Windows 11 is on version 24H2 or 25H2
The latest cumulative update is installed
Windows Update is not paused
A restart is not pending
The PC has compatible NVIDIA RTX hardware
The NVIDIA display driver is current enough for the workload
The app actually uses Windows ML or ONNX Runtime acceleration

It is also worth remembering that installing KB5089168 does not guarantee every AI application will use it. Apps must be written to request or register the relevant execution provider path.

Strengths and Opportunities

KB5089168’s biggest strength is that it demonstrates Microsoft’s ability to service AI acceleration as a modular Windows platform capability rather than as a static feature frozen inside a yearly OS release. That model gives Microsoft, NVIDIA, developers, and users a faster path to incremental improvements, provided that update delivery and diagnostics remain transparent enough.

Better local AI performance potential for compatible ONNX Runtime and Windows ML workloads on RTX hardware.
Reduced app packaging burden because developers can rely on Windows-managed execution providers instead of bundling large vendor libraries.
Improved servicing velocity through Windows Update rather than waiting for major app or OS feature releases.
Cleaner multi-vendor architecture if Microsoft maintains similar quality across NVIDIA, AMD, Intel, and Qualcomm providers.
Stronger RTX PC value proposition for creators, developers, and enthusiasts running local AI workflows.
More consistent deployment model for users who expect Windows Update to handle platform components automatically.
A clearer foundation for future Copilot+ and local AI features that need low-latency inference on available hardware.

Risks and Concerns

The main risk is opacity. KB5089168 updates a critical AI runtime component, but Microsoft’s public description is limited to general improvements, which leaves developers, administrators, and advanced users with little detail about operator changes, performance fixes, regressions, or known issues.

Sparse release notes make it difficult to assess what changed beyond the version number.
Windows Update dependency can complicate first-run app behavior when updates are paused, deferred, or blocked.
Enterprise policy conflicts may prevent execution-provider downloads on managed devices.
Version fragmentation can appear if different PCs receive different provider builds at different times.
Troubleshooting complexity increases because issues may involve the app, model, driver, runtime, provider, or Windows servicing state.
Fallback behavior may hide performance problems if apps silently revert to CPU or generic GPU paths.
User expectations may be inflated if people mistake this for a general NVIDIA graphics or gaming performance update.

Looking Ahead

KB5089168 is part of a larger transition in Windows: AI acceleration is becoming a maintained platform layer, much like graphics, security, and media infrastructure. The update may be small in visible surface area, but it reinforces Microsoft’s plan to make local inference available through standardized APIs and dynamically serviced components.

What to watch next

The next important signal will be whether Microsoft improves transparency around execution-provider updates. Developers and IT administrators need more than a KB entry; they need practical release notes that explain compatibility changes, performance fixes, and any known limitations.
Things to monitor include:

Whether future TensorRT-RTX updates include more detailed changelogs
How quickly Windows 24H2 and 25H2 devices receive provider updates
Whether Windows ML 2.x provider packaging stabilizes for mainstream developers
How AMD, Intel, and Qualcomm provider updates compare in cadence and quality
Whether major creative and productivity apps adopt Windows ML provider discovery

The broader competitive question is whether Windows ML can become the default local AI layer for Windows applications. NVIDIA has strong momentum on RTX PCs, but Microsoft’s success will depend on making the same model feel reliable across the full PC ecosystem.
If that happens, updates like KB5089168 will become routine but strategically important. They will not always make headlines, yet they will shape whether local AI on Windows feels fast, dependable, and broadly available.
KB5089168 is a quiet update with outsized implications: it updates the NVIDIA TensorRT-RTX Execution Provider to version 2.2604.1.0, reinforces Windows Update as the delivery channel for AI acceleration components, and gives RTX-equipped Windows 11 24H2 and 25H2 systems a newer foundation for local ONNX inference. The short-term impact will depend on compatible apps and hardware, but the long-term message is clear. Microsoft is building Windows into a modular AI runtime platform, and the execution provider layer is where much of that future will either come together or fragment.

Source: Microsoft Support KB5089168: Nvidia TensorRT-RTX Execution Provider update (version 2.2604.1.0) - Microsoft Support

Search

Navigation section

KB5089168 Updates Windows ML TensorRT-RTX Provider for RTX on 24H2 and 25H2

Background

What KB5089168 Actually Delivers

Installation and visibility

Why Execution Providers Matter

The abstraction layer beneath Windows AI

The RTX Angle: Why NVIDIA Gets a Specialized Path

Client AI instead of datacenter AI

Windows 11 24H2 and 25H2: The Servicing Context

Why the prerequisite matters

Developer Impact: Less Bundling, More Runtime Dependency

The tradeoff developers must manage

Consumer Impact: Faster Local AI, Mostly Invisible Updates

What users may notice

Enterprise Impact: Control, Compliance, and Fleet Visibility

AI components become operational assets

Competitive Implications: Microsoft’s AI Hardware Neutrality Test

NVIDIA’s advantage and Microsoft’s balancing act

Technical Considerations: Compilation, Caching, and Fallback

Why first-run behavior matters

How to Check and Troubleshoot KB5089168

Practical verification steps

Strengths and Opportunities

Risks and Concerns

Looking Ahead

What to watch next

Navigation section

KB5089168 Updates Windows ML TensorRT-RTX Provider for RTX on 24H2 and 25H2

What KB5089168 Actually Delivers​

Installation and visibility​

Why Execution Providers Matter​

The abstraction layer beneath Windows AI​

The RTX Angle: Why NVIDIA Gets a Specialized Path​

Client AI instead of datacenter AI​

Windows 11 24H2 and 25H2: The Servicing Context​

Why the prerequisite matters​

Developer Impact: Less Bundling, More Runtime Dependency​

The tradeoff developers must manage​

Consumer Impact: Faster Local AI, Mostly Invisible Updates​

What users may notice​

Enterprise Impact: Control, Compliance, and Fleet Visibility​

AI components become operational assets​

Competitive Implications: Microsoft’s AI Hardware Neutrality Test​

NVIDIA’s advantage and Microsoft’s balancing act​

Technical Considerations: Compilation, Caching, and Fallback​

Why first-run behavior matters​

How to Check and Troubleshoot KB5089168​

Practical verification steps​

Strengths and Opportunities​

Risks and Concerns​

Looking Ahead​

What to watch next​

What KB5089168 Actually Delivers

Installation and visibility

Why Execution Providers Matter

The abstraction layer beneath Windows AI

The RTX Angle: Why NVIDIA Gets a Specialized Path

Client AI instead of datacenter AI

Windows 11 24H2 and 25H2: The Servicing Context

Why the prerequisite matters

Developer Impact: Less Bundling, More Runtime Dependency

The tradeoff developers must manage

Consumer Impact: Faster Local AI, Mostly Invisible Updates

What users may notice

Enterprise Impact: Control, Compliance, and Fleet Visibility

AI components become operational assets

Competitive Implications: Microsoft’s AI Hardware Neutrality Test

NVIDIA’s advantage and Microsoft’s balancing act

Technical Considerations: Compilation, Caching, and Fallback

Why first-run behavior matters

How to Check and Troubleshoot KB5089168

Practical verification steps

Strengths and Opportunities

Risks and Concerns

Looking Ahead

What to watch next