Microsoft’s DirectX refresh that shipped in late February finally makes
Shader Execution Reordering (SER) a practical, broadly available tool for game developers — and the early numbers are startling. With the retail release of
Shader Model 6.9 and
DirectX Raytracing (DXR) 1.2 exposed via the DirectX Agility SDK 1.619, Microsoft has formalized SER and paired it with
Opacity Micromaps (OMMs) in a package designed to reduce the worst performance pathologies of heavy ray tracing and path tracing workloads. In Microsoft’s own demos, enabling SER produced roughly
+40% framerate on an NVIDIA RTX 4090 and — in specific Intel Arc B‑Series test configurations —
up to +90%. Those headline figures demand careful unpacking: SER is real, powerful, and game-changing in particular circumstances, but it’s not a universal “make everything twice as fast” button. This feature levels the API playing field and gives developers a standardized, cross-vendor way to improve ray tracing coherence — if and when engine teams take the work to adopt it.
Background / Overview
Ray tracing in real time has always been a battle between visual fidelity and hardware efficiency. Modern GPUs are tremendously parallel devices, but ray workloads are intrinsically stochastic: rays bounce, scatter, and end up invoking wildly different shading code depending on materials, alpha-tested geometry, and scene content. That
divergence breaks the GPU’s occupancy and cache behavior, and the result is a lot of idle lanes in otherwise capable execution units.
NVIDIA introduced a pragmatic countermeasure in 2022 with its RTX 40‑series hardware and accompanying developer tooling:
Shader Execution Reordering (SER). SER lets an application (or engine) describe the expected cost or category of shading work per ray so that the driver or hardware can
reorder threads after ray traversal such that threads doing similar work run together. This reduces divergence, improves cache locality, and increases utilization of shading hardware.
Microsoft’s work over the past year has been to move SER from vendor-specific extensions and previews into the official DXR ecosystem. The recent Agility SDK 1.619 release (and the Shader Model 6.9 retail announcement made on February 26, 2026) formalizes SER as part of DXR 1.2 and makes it available to developers through standardized HLSL primitives and the new
HitObject abstraction. That means engine authors can write a single code path that expresses reordering intent and rely on drivers to perform the actual reordering where supported.
What Shader Execution Reordering (SER) actually does
The problem: divergence and "idle lanes"
GPUs execute threads in fixed-size groups (warps/wavefronts/wave64) and perform best when those threads follow the same execution path and access contiguous memory. In ray traced scenes, however, a single group of rays may hit wildly different materials, textures, and shader code paths. Some rays hit simple diffuse surfaces; others trigger expensive path-traced bounces, procedural textures, or heavy texture fetches and math. When this happens, the GPU serializes or stalls parts of the group while waiting for heavy threads to finish —
effective throughput collapses.
SER addresses that by enabling reordering of the thread execution order after traversal but before expensive shading happens. The ray generation shader can provide a small “sort key” or instruct the system to use properties of the hit (a
HitObject) as a key. The driver/hardware then groups threads by key, so batches of threads with similar expected work execute together. The result is more coherent shader execution and fewer idle cycles.
Key components Microsoft exposed in DXR 1.2 and SM 6.9
- MaybeReorderThread() / sorting hints: HLSL primitives let a RayGen shader provide a sort key (or leave it to the HitObject) so the runtime can reorder threads.
- HitObject abstraction: Decouples traversal (ray-scene intersection) from ClosestHit/Miss shader invocation. A RayGen can inspect hit metadata and decide whether to invoke ClosestHit or depend on hit attributes directly — an important flexibility point for reordering and data locality strategies.
- Opacity Micromaps (OMMs): Hardware-accelerated metadata that tells traversal/shaders which tiny triangles are fully transparent. OMMs reduce unnecessary AnyHit/ClosestHit invocations for alpha-tested geometry like foliage, chain-link fences, and leaves.
- Shader Model 6.9 primitives: The new shader model formalizes required support for certain int/float formats and wave ops that make SER safer and more portable across drivers.
These primitives together give developers the hooks to both
describe coherence (via sort keys and hit objects) and to
exploit it (grouped shader execution, skipping unnecessary shading work with OMMs).
Why Microsoft standardizing SER matters
- Engine and cross‑vendor portability
Before DXR 1.2/SM 6.9, SER was a useful but vendor-specific optimization: NVIDIA (and some Vulkan extensions) exposed reordering, Intel implemented SER acceleration on Arc B‑Series silicon, and other vendors experimented. With Microsoft shipping SER as part of the Agility SDK retail release, engine teams can encode reordering semantics once in HLSL and rely on drivers to use them where possible. That drastically reduces engineering branches and makes broader adoption realistic.
- No new hardware required — but drivers matter
One of the most important practical points is that SER can be enabled without users buying new GPUs. On older hardware that doesn’t implement hardware reordering, enabling SER is typically a no‑op for the reorder step (the API and code still compile and run), so there’s no downside to shipping SER-capable shaders. When matched with a driver/runtime that actually performs reordering, you get the performance lift.
- Complementary with OMMs and denoisers
Opacity Micromaps reduce shader invocations for alpha-tested geometry, and when paired with SER’s ability to group similar work, the two features compound. That’s a particularly potent combination for path tracing and complex foliage scenes that were previously performance-prohibitive.
- Toolchain and profiling support
Microsoft notes day‑one PIX support for SER profiling and debugging; NVIDIA, Intel, and other tooling vendors have also extended their profilers (e.g., Nsight) to make SER-driven hotspots visible. That tooling is essential for developers to get the gains without introducing hard-to-diagnose regressions.
What Microsoft’s benchmarks show — and what they don’t
Microsoft provided demonstration runs using an updated D3D12 raytracing sample (the D3D12RaytracingHelloShaderExecutionReordering sample in the DirectX Graphics Samples repo). The sample is intentionally synthetic: it draws a screen-sized set of rays where a configurable fraction of rays perform an artificially heavy amount of work while others are light. That engineered divergence is a perfect stress test for SER and demonstrates how sorting similar threads together yields dramatic utilization improvement.
Reported results:
- NVIDIA RTX 4090: roughly +40% framerate with SER enabled in the demo.
- Intel Arc B‑Series (select configurations): up to +90% framerate in Microsoft’s synthetic demo.
Important context and caveats:
- These numbers come from controlled demonstration workloads designed to highlight SER’s strengths. They are not a middle-of-the-road real-world game benchmark. Actual gains in released AAA titles will vary significantly depending on scene characteristics, materials mix, engine implementation, denoiser interplay, and driver maturity.
- The Intel Arc numbers, while impressive, were shown for a couple of configurations — Microsoft did not disclose full details on which Arc B SKUs or whether the test used integrated Xe compute in certain Core Ultra processors or discrete Battlemage dies. That means the headline 90% figure should be interpreted as a best-case demonstration rather than a guarantee for all Arc B hardware.
- Vendor differences remain. NVIDIA’s Ada/Blackwell-era hardware has had SER acceleration for some time and therefore tends to show strong gains when engines are optimized for it. AMD’s approach has been more conservative: API exposure is present, but hardware-level thread reordering behavior and driver optimizations for some Radeon products may trail competitors initially.
In short: the demo numbers indicate the
potential of SER more than an average expected uplift across real-world titles.
How developers should approach adoption
Adoption is a multi-step engineering process. SER provides primitives, but to extract real-world gains developers will need to integrate, test, and tune.
- Start with the prerequisites:
- Target Shader Model 6.9 and ship (or include) Agility SDK 1.619 if you want to carry the runtime updates with your title.
- Ensure you have a modern DXC (DirectX shader compiler) that supports the new HLSL primitives.
- Update engine shader compilers and build pipelines to emit the new constructs and to optionally generate SER-enabled permutations at runtime.
- Profile first, then add SER:
- Use GPU profiling tools that understand SER (PIX on Windows, Nsight for NVIDIA). Measure where ray shading divergence is the bottleneck and where grouping by material/coherence might help.
- Identify heavy vs. light shader categories. Common sort keys include material type, shader complexity banding, and whether alpha-tested geometry is involved.
- Integrate HitObject patterns:
- Replace some ClosestHit/Miss invocation paths by using HitObject attributes where appropriate. That lets RayGen inspect hit metadata and decide whether to invoke full shading or do a lightweight shading step inline.
- Use HitObject judiciously; it’s powerful but changes execution flow and can complicate integration with existing material systems.
- Combine SER + OMM:
- For foliage, chain-link fences, and other alpha-tested geometry, OMMs can cut away shading invocations dramatically. Follow OMM best practices for packing and generation during asset import.
- Validate across drivers:
- Test on NVIDIA, Intel, and AMD hardware with up-to-date drivers. Some vendors will perform hardware reordering; others may treat SER as a no‑op. Ensure your fallback logic (where you rely on no assumptions about ordering) is correct.
- Watch out for denoiser interactions:
- SER changes the order of execution and therefore the pattern of GPU memory accesses and caching. Your denoiser’s assumptions — especially temporal algorithms — need to be validated against reordered shading patterns.
- Design automated testing:
- Regression tests should include SER-enabled and SER-disabled shader variants; this guards against subtle differences across vendors and driver versions.
Hardware and vendor readiness: where things stand
- NVIDIA: Early adopter. Ada/Blackwell‑era GPUs have hardware acceleration for SER, and the company has published tooling and sample integrations showing end-to-end benefits. Vulkan's VK_NV_ray_tracing_invocation_reorder extension has been present for a time, and NVIDIA’s path-tracing samples were updated to demonstrate SER usage.
- Intel: Arc B‑Series (Battlemage) silicon is reported to have SER acceleration; Microsoft’s demos specifically highlight Arc B configurations showing very large gains. Intel has published developer preview drivers that enable SER paths.
- AMD: Microsoft and public reporting indicate API exposure for AMD platforms, but as of the initial Agility SDK retail rollout, some Radeon drivers may not yet perform hardware reordering — they expose the API but implement the reorder as a no‑op. That means AMD-coded games can still ship SER-aware shaders and be ready for performance wins when future drivers/hardware enable reordering.
- WARP / software rasterizers: Microsoft has preview support so developers can run and test without specialized hardware.
This vendor landscape means developers can adopt SER in a forward-compatible fashion: code once, benefit where the driver/hardware can do the work, and degrade gracefully elsewhere.
Risks, unknowns, and practical limitations
SER is compelling, but it also introduces new complexity and some risks developers must consider.
- Synthetic vs. real content: Demonstration scenes often cluster heavy and light rays in extreme ways to make gains obvious. Most games present a mix; gains will be meaningful where divergence is a problem, but small or irregular scenes may see little benefit.
- Driver and hardware variance: The API lets developers express intent, but the reordering and its effectiveness are implementation-dependent. Differences in how drivers implement reordering and how hardware handles memory and caches means tuning per-vendor may still be necessary.
- Increased shader complexity: Using HitObjects and branching more logic into RayGen shaders can complicate material systems and shader permutations. That has maintenance costs and may increase build/test surface.
- Debugging and determinism: Reordering changes execution order and timing. Some classes of bugs and race conditions may be harder to reproduce or to reason about. Robust tooling like PIX and Nsight is essential, but not every regression is trivial to track down.
- Potential to hide poorly designed shaders: SER can mask inefficiencies in shader code by making execution more coherent. That’s useful in the short term, but it risks reducing pressure to simplify or optimize expensive shaders that could still be a problem on other platforms or in other scenes.
- Denoiser and postprocess interactions: Since SER alters shading order and memory access patterns, it may reveal or exacerbate subtle denoiser artifacts or temporal instability unless carefully profiled.
Because of these considerations, SER is best seen as a tool in the optimization toolkit — powerful when used correctly, but not a substitute for careful shading architecture.
What this means for gamers and system builders
- You don’t necessarily need a new GPU: If you buy a game that ships SER-capable shaders, older hardware that lacks hardware reordering will still run the same shaders — often with the reorder step acting as a no-op. That means developers can roll SER into shipped shaders without mistakenly penalizing older hardware.
- Driver updates matter: To see SER’s benefits you’ll need the latest GPU drivers from NVIDIA, Intel, or AMD. Keep an eye on vendor release notes. On Windows, titles that include the Agility SDK with updated runtimes can also bring features to players without waiting for a Windows system update.
- Expect variability: If your system has an Intel Arc B‑Series GPU or a recent NVIDIA RTX part, you might see bigger gains in ray-traced path tracing or reflection passes, but exact uplift varies by title, scene, and the driver shipped with your GPU.
- Look for engine support: Early adopters include sample integrations and some engine teams. Wider availability in mainstream engines (e.g., Unreal Engine, custom engines) will determine how quickly games start to use SER in production.
Practical checklist for engine teams (quick reference)
- Update toolchain: adopt Agility SDK 1.619 and DXC with SM 6.9 support.
- Add SER-capable shader permutations guarded by runtime checks.
- Instrument hotspots with PIX and Nsight’s ray tracing live-state tools.
- Generate and ship Opacity Micromaps for alpha-tested assets where relevant.
- Test on NVIDIA, Intel, and AMD hardware/driver combos; automate regression tests.
- Validate denoiser stability and temporal artifacts after reordering is enabled.
- Monitor driver changelogs and vendor SDK samples for best practices.
Conclusion
Microsoft’s formalization of Shader Execution Reordering in
DXR 1.2 and
Shader Model 6.9, delivered through
Agility SDK 1.619, marks an important shift in the ray tracing landscape. By standardizing the primitives that let engines express coherence and by pairing SER with Opacity Micromaps, Microsoft has provided a practical, vendor-agnostic path for engines to wring significantly better utilization from existing GPUs. Microsoft’s demo numbers — including the eye‑catching +90% in certain Intel Arc B configurations — highlight what’s possible in the best-case scenarios. But the real story is more nuanced: SER is a high-leverage optimization that will matter most in scenes with pronounced divergence and where engines invest engineering time to integrate and tune it.
For developers, SER is worth taking seriously now: profile, prototype with the provided samples, and design tolerant shader fallbacks so your title benefits on supported hardware while remaining stable elsewhere. For gamers, the promise is appealing: better ray-tracing performance without mandatory GPU upgrades. For the industry, this is another step toward making path tracing and richer ray-traced effects practical across a broader install base — but it will take months of driver maturity and careful engine integration before SER’s full potential is reflected in mainstream AAA releases.
The immediate takeaway: SER isn’t a magic bullet, but it is a potent, standardized lever that lets software and drivers cooperate to turn messy ray workloads into orderly, high-performance worksets. If engine teams take it up — and the early tooling and SDK support make that easier than ever — SER will quietly but meaningfully raise the practical ceiling for real‑time ray tracing on Windows.
Source: Windows Central
Microsoft just unlocked massive ray‑tracing speed boosts on Windows