The Linux kernel's VT-d IOMMU driver received a targeted upstream patch that closes a race-condition and use-after-free exposure in the I/O page-fault (IOPF) reporting path by switching to a rbtree lookup for probed devices and introducing a synchronization mutex — a change that corrects a fragile device-lookup sequence that could be triggered by device teardown and lead to kernel instability or denial-of-service.
The defect is tracked as CVE-2024-35843 and lives in the Intel VT-d IOMMU driver within the Linux kernel. At a high level, the vulnerability arises because the IOPF handler used a slow list-based search (pci_get_domain_bus_and_slot()) to map a source ID reported by hardware back to the kernel’s PCI device structure. Upstream maintainers replaced that list walk with a device rbtree lookup (device_rbtree_find()) to make fault-path lookups deterministic and fast, and they added a mutex to prevent a window where the device could be released between lookup and subsequent use — a window that could produce a use-after-free or cause a fault-handler to operate on a device that is no longer valid.
This change is part of a small but consequential set of VT-d improvements that aim to make fault reporting both faster and safer. The maintainers explicitly call out the rare nature of the conflict (an I/O page fault racing with device teardown) and say the added mutex does not meaningfully affect performance in practice.
However, the device that raised the I/O page fault is not synchronized with the kernel’s teardown path. A device could be removed and freed between the rbtree lookup and the code that extracts fault parameters, which created a theoretical use-after-free condition. To close that window, the patch introduces a mutex that serializes IOPF handling with the IOMMU’s device release path. The mutex ensures the device cannot be released while the fault handler is collecting parameters and acting on the device pointer.
To be concrete:
The trend is straightforward: fault-reporting paths are high-value, low-tolerance code paths where performance matters and correctness is critical. Replacing list scans with rbtree lookups and narrowing lock scopes are common kernel approaches to reconcile both requirements.
Security in the hardware-software boundary is rarely glamorous, but small synchronization fixes like this one buy stability and reliability for systems that rely on IOMMU isolation. Apply the patches when your vendor publishes them, test, and maintain good device governance — those are the practical steps that close the window this CVE exposed and keep production systems running.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
The defect is tracked as CVE-2024-35843 and lives in the Intel VT-d IOMMU driver within the Linux kernel. At a high level, the vulnerability arises because the IOPF handler used a slow list-based search (pci_get_domain_bus_and_slot()) to map a source ID reported by hardware back to the kernel’s PCI device structure. Upstream maintainers replaced that list walk with a device rbtree lookup (device_rbtree_find()) to make fault-path lookups deterministic and fast, and they added a mutex to prevent a window where the device could be released between lookup and subsequent use — a window that could produce a use-after-free or cause a fault-handler to operate on a device that is no longer valid.This change is part of a small but consequential set of VT-d improvements that aim to make fault reporting both faster and safer. The maintainers explicitly call out the rare nature of the conflict (an I/O page fault racing with device teardown) and say the added mutex does not meaningfully affect performance in practice.
What exactly was wrong?
The lookup and the race
The IOPF reporting path receives a hardware fault with a source identifier that the kernel must map back to the corresponding PCI device to collect fault context and apply recovery. Historically, the VT-d code used pci_get_domain_bus_and_slot(), which iterates the global PCI device list until a match is found. That search is O(n) and inefficient for real-time fault handling. To improve lookup speed, the driver began maintaining a per-IOMMU rbtree keyed by the device source ID; this allows O(log n) lookups with device_rbtree_find().However, the device that raised the I/O page fault is not synchronized with the kernel’s teardown path. A device could be removed and freed between the rbtree lookup and the code that extracts fault parameters, which created a theoretical use-after-free condition. To close that window, the patch introduces a mutex that serializes IOPF handling with the IOMMU’s device release path. The mutex ensures the device cannot be released while the fault handler is collecting parameters and acting on the device pointer.
Why a rbtree?
The move to an rbtree is a performance and correctness play. Fault reporting occurs in low-latency contexts and can be invoked by device-generated events. An rbtree indexing devices by source ID makes lookups deterministic and far faster than a list walk under realistic device counts, reducing the time spent in the fault path and improving the kernel’s ability to recover from device-driven errors. The rbtree also enables other VT-d optimizations tied to ATS/PRI features.Severity, exploitability and real-world impact
Security databases classify CVE-2024-35843 as an availability-impacting flaw with medium-to-high practical impact: while the underlying problem is a kernel race condition that can lead to a use-after-free, the most realistic attacker-controlled effect is denial-of-service — typically kernel oops, hang, or system crash. Several vendors and tracking services assign a CVSS v3 base score in the mid-to-high range, reflecting low attack complexity but high availability impact where exploitation is successful.To be concrete:
- The upstream description states that an attacker could cause a kernel component to operate on a freed device structure if a device is released between the rbtree lookup and the subsequent parameter extraction, leading to use-after-free scenarios. Those scenarios can, and in similar historical cases have, resulted in kernel crashes or persistent hangs.
- Distribution advisories include CVE-2024-35843 in lists of high-priority kernel fixes where the availability impact (DoS) is the main concern, and several enterprise distributions have prioritized backports and updates.
What the upstream patch does (technical breakdown)
Key code changes
- Replace pci_get_domain_bus_and_slot() in the IOPF handler with device_rbtree_find() to query the per-IOMMU rbtree of probed devices. This speeds up lookup and reduces the chance of missing the device under concurrent workloads.
- Add a mutex that surrounds:
- the rbtree lookup and
- the subsequent call to iopf_get_dev_fault_param() and other device-dependent operations.
This prevents the device from being removed by the IOMMU release path in the tiny window between lookup and parameter extraction. The lock is scoped narrowly to the IOPF reporting path to minimize contention. - Adjust release paths so that devices removed from the rbtree are treated as already-removed for ATS invalidations and other late-arriving operations, which removes the need to send explicit ITE/ATS requests to devices no longer tracked. That avoids additional fault-triggers or endless retries.
Why maintainers believe this is safe
Upstream commentary notes the conflict being closed is rare — an I/O page fault is emitted by device hardware, and a separate IOMMU device release is an asynchronous lifecycle event that seldom races with a fault for the same device. Because the mutex serializes a path that rarely overlaps with device teardown, maintainers argue there should be no measurable performance penalty for normal workloads. The tradeoff favors correctness in a high-risk path (fault handling) over micro-optimizations that left a small but real use-after-free window open.How likely is exploitation in the wild?
This CVE is not a typical remote code-execution door — instead, its practical impact is on availability. Exploitation requires either a kernel-accessible pathway to cause I/O page faults from a device while concurrently provoking the device to be released, or access to hardware/firmware that can repeatedly induce the fault condition. That narrows the threat model:- Cloud and multi-tenant virtualization environments that expose PCI passthrough devices (VFIO, SR-IOV, passthrough GPUs, or NVMe controllers) present the most plausible large-scale risk because tenant-controlled devices or misbehaving guest devices can interact with the host’s IOMMU layer.
- Local attackers with the ability to manipulate hot-pluggable devices, or supply crafted devices to a target, could produce repeated fault conditions, but such attack vectors are harder to scale without some physical or privileged access.
Vendor and distribution response
Multiple enterprise distributions and tracking services have flagged CVE-2024-35843 and prepared fixes or advisories. Notable signals:- Amazon’s ALAS entries list CVE-2024-35843 with a CVSSv3 base score of 6.8 and show an active tracking and package-affect list for Amazon Linux variants. Distribution-level fixes were marked as pending or in-progress for several kernel branches.
- SUSE included CVE-2024-35843 in a security-update rollout addressing several kernel CVEs, noting the VT-d rbtree change specifically in their kernel update announcements. Enterprise subscribers should see vendor-supplied kernel patches.
- The upstream kernel and the Linux kernel mailing list discussion contain the original patch series and rationale, which is the canonical technical source for the change. Administrators and integrators should treat the LKML patch and upstream git commit as the definitive description of the fix.
Practical mitigation and hardening guidance
If you operate systems that may be affected — particularly hypervisors, cloud hosts, or systems that accept untrusted or tenant-controlled PCI devices — take these steps.- Patch promptly
- Apply vendor-supplied kernel updates that include the upstream rbtree/mutex fix. This is the recommended and permanent remediation.
- Where a distribution does not yet provide an official backport, consider using a distribution-provided long-term stable kernel that includes the upstream commit or ask your vendor for a security backport.
- Audit device exposure
- Inventory which systems expose PCI devices via passthrough (VFIO, SR-IOV, direct PCI passthrough to guests) and prioritize those for early patching.
- If possible, remove or restrict passthrough of untrusted devices; require vetted device firmware and drivers.
- Consider temporary mitigations only when patching is not immediately feasible
- Reboot sequences that reset device state can help clear indeterminate conditions, but this is not a fix.
- For lab or low-risk devices, consider disabling IOMMU/VT-d in firmware temporarily to remove the IOMMU code path, but be aware this disables critical isolation and breaks PCI passthrough and some virtualization features. This is a blunt instrument and should be treated as a last resort. Upstream maintainers did not recommend disabling VT-d as a substitute for the patch.
- Test kernel updates in a staging environment
- Because the patch touches low-level device and IOMMU pathways, run representative workloads that exercise PCIe device teardown, hotplug, and passthrough to ensure no regressions surface in your specific hardware configuration.
- Monitor for signs of exploitation or instability
- Kernel oops logs, repeated IOPF traces, or unexplained device hangs after hot-unplug activity are indicators the race might be triggered in production. Coordinate with your kernel vendor support channel if you observe repeatable traces.
Risk analysis: strengths of the fix and residual concerns
Strengths
- The upstream fix is surgical and well-scoped: it replaces a linear search with a scalable rbtree lookup and closes the small race window with a narrowly-scoped mutex. That approach addresses both performance and correctness in the fault path without redesigning the subsystem.
- The maintainers explicitly call out the expected performance behavior: because the conflicting operations are rare, the lock is not expected to introduce runtime overhead for normal workloads. This reduces the operational cost of adopting the patch.
- Distributors and enterprise vendors have taken the issue seriously and included the fix in kernel update streams, which simplifies remediation for most production operators.
Residual concerns and caveats
- The kernel’s interaction with hardware remains complex. The fix closes a clear race condition, but devices and firmware can still produce pathological states that stress IOMMU behavior (for example, repeated ATS invalidation failures or malformed device responses). Similar past VT-d issues have produced hard lockups before comprehensive fixes were developed, so operators should remain vigilant.
- Cloud operators exposing PCIe devices to tenants should treat this CVE as operationally significant even if the exploit requires device-level interaction; an attacker controlling a guest device can often craft sequences that reproduce these race windows. Patching alone is necessary but not sufficient; careful device governance and hardware vetting remain important.
- As with any kernel fix in a high-surface area component, there is a risk of regressions on exotic hardware. That is why vendors are pushing the change through distribution testing channels and advising staged rollouts. Administrators must test in their own environments before wide deployment.
How this fits with recent VT-d/IOMMU fixes and the larger trend
This patch is not an isolated event; it’s part of a continuing set of improvements and hardening efforts to the VT-d IOMMU code to handle ATS (Address Translation Services), PRI (Page Request Interface), and device lifecycle edge cases more robustly. Earlier fixes in this area included logic to stop sending ATS invalidation requests to devices that have been released and to use data structures (rbtree) that make device lookups more deterministic. Many of those efforts arose after field reports of ATS invalidation loops and hard lockups that were difficult to reproduce. The current CVE fix combines those lessons into more robust lookup and synchronization practices.The trend is straightforward: fault-reporting paths are high-value, low-tolerance code paths where performance matters and correctness is critical. Replacing list scans with rbtree lookups and narrowing lock scopes are common kernel approaches to reconcile both requirements.
Recommended action plan for WindowsForum readers (practical checklist)
- If you run hypervisors, cloud hosts, or systems that perform PCI passthrough:
- Check your distribution’s security advisory feed for CVE-2024-35843 and obtain the vendor-supplied kernel package that includes the fix. Prioritize patching on hosts that allow guest device passthrough.
- Test the updated kernel in a staging environment, exercising hot-unplug, hot-plug, and passthrough workflows.
- Roll the kernel update into production with standard change-control procedures.
- If you operate general-purpose servers or desktops:
- Apply routine kernel security updates when your vendor releases them. The practical exploit risk is lower for systems that do not expose untrusted devices, but updates are still recommended.
- Avoid relying on disabling IOMMU as a mitigation unless you understand the functional and security trade-offs.
- For integrators and appliance vendors:
- Validate that firmware and device drivers interacting with the IOMMU follow expected lifecycles and do not leave devices in ambiguous states after removal.
- Consider stress-testing device teardown and ATS flows as part of your QA acceptance criteria.
Conclusion
CVE-2024-35843 is a classic kernel hardening case: a small race in an uncommon but critical code path created the potential for use-after-free and availability failures, and maintainers responded with a targeted, low-overhead fix that replaces a linear lookup with a per-IOMMU rbtree and adds a narrowly-scoped mutex to serialize IOPF reporting with device release. The change improves both performance and correctness in the VT-d fault path, and distributors are rolling the fix through standard kernel-update channels. Administrators, cloud operators, and anyone exposing PCI devices to untrusted workloads should prioritize the vendor-published kernels and follow standard staging and testing practices before deploying updates.Security in the hardware-software boundary is rarely glamorous, but small synchronization fixes like this one buy stability and reliability for systems that rely on IOMMU isolation. Apply the patches when your vendor publishes them, test, and maintain good device governance — those are the practical steps that close the window this CVE exposed and keep production systems running.
Source: MSRC Security Update Guide - Microsoft Security Response Center