The Linux kernel received a targeted fix for a subtle but consequential deadlock in the DRM scheduler:
drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb, tracked as CVE‑2025‑40329. The patch restructures how the scheduler handles fence callbacks and dependency re‑arming to avoid an interrupt-context locking inversion that could reliably hang the kernel in some workloads. Operators who run kernels with DRM scheduler code (desktop GPUs, GPU‑accelerated VMs, and vendor kernels that enable DRM/SCHED) should treat this as an availability‑first vulnerability and apply vendor updates or backports as soon as practical.
Background / Overview
The vulnerability concerns the kernel’s DRM scheduler subsystem (drm/sched), which coordinates job submission, dependency tracking, and fence handling for GPU drivers. In particular, the function drm_sched_entity_kill_jobs_cb — used as a callback when a fence signals — could be invoked in contexts that are not safe for the locking APIs the rest of the code uses. That mismatch created a realistic deadlock: one CPU path could hold an XArray (xa) lock while disabling interrupts, while an interrupt or callback on another CPU attempted to grab the same xa lock and a fence lock in an incompatible order. The resulting lock inversion is deterministic and can produce a hard kernel hang. This scenario was reported in Mesa bug discussions and is reflected in the kernel and distribution advisories. The public vulnerability records (Ubuntu, SUSE, OSV, CVE listings) label CVE‑2025‑40329 as a
Medium severity availability issue; some vendor trackers report a CVSS‑style impact in the mid‑5.x range. The operational consequence is a
local denial of service: a hung kernel, driver oops, or system instability requiring a module reload or reboot. The paths are reachable from unprivileged user space in many real-world desktop and container configurations where DRM device nodes are exposed.
Technical anatomy: what went wrong
The actors: fences, xa_lock and interrupt contexts
At the heart of the problem are three interacting pieces:
- dma_fence callbacks and signalling: fences are the kernel primitive that GPU drivers use to track job completion and dependencies. The kernel exposes callbacks (dma_fence_add_callback) that may be invoked when a fence is signalled (dma_fence_signal).
- XArray (xa) locking: many scheduler internals use XArray helpers (xa* functions) to traverse per‑entity dependency lists. Some xa variants assume they do not run in IRQ/context where interrupts are disabled; others provide irq‑safe variants (xa__irq).
- Lock ordering and interrupt semantics: in one execution path, code that traverses job dependencies uses standard xa_* APIs without disabling interrupts. Meanwhile, a fence signal callback can run in interrupt context (or under conditions where local_irq_disable has been called) and then call into the same dependency traversal code. If the two contexts take different locks in different orders, a deadlock is possible.
A representative kernel log excerpt used in advisories shows the deadlocking sequence: one CPU holds xa_lock and then disables interrupts before taking fence->lock; another CPU (in callback context) takes fence->lock and then attempts to take xa_lock — producing the circular lock dependency and a
DEADLOCK trace. The upstream description and vendor advisories include near‑identical call stacks and analysis.
Why a simple replacement of xa_* would not be sufficient
One obvious remediation would be to replace non‑IRQ XArray calls by their irq‑safe counterparts (xa_*_irq), but that alone does not fully address the problem. The commit message and maintainer discussion (recorded in the advisories and the kernel patch notes) identify a second, related race: dma_fence_signal itself takes fence->lock, and dma_fence_add_callback also grabs a fence lock. If code executed while holding one fence lock loops through dependencies and re‑arms callbacks on other fences that share locks, a second form of lock inversion can occur. In practice, the safe fix required re‑architecting the iteration and re‑arming logic so that fence callbacks run in a context where those nested locking issues cannot produce deadlocks.
The fix: what changed in the kernel
Maintainers moved the dependency iteration and
re‑arming of dependent fences out of the direct fence callback and into a deferred work context (a worker function named drm_sched_entity_kill_jobs_work in the patch set referenced by advisories). By performing the potentially lock‑sensitive traversal and callback re‑arm under a workqueue context — where interrupts are not disabled and lock ordering can be controlled — the patch avoids the mixed IRQ/non‑IRQ lock sequences that caused the deadlock.
Key properties of the fix:
- The change is surgical and localized to the scheduler callback handling: the callback itself remains minimal and posts a work item rather than doing heavy iteration inline.
- The deferred work runs in process context where long waits, re‑arming, and other operations are safe.
- The patch preserves normal semantics for successful job completion and dependency resolution while removing the deadlock window.
Kernel commit references were added to public trackers and the observation is consistent across stable backport commits noted by distribution trackers. Where vendors publish stable backports, they typically map the fix into their long‑term kernels. Several public CVE trackers list the kernel patch references used for mapping.
Affected systems and distribution status
- Affected component: Linux kernel — drm/sched (DRM scheduler and related driver code that uses fence dependency lists).
- Typical exposure: Desktop and workstation systems with GPU drivers that use the DRM scheduler (Intel, AMD, and other drivers that rely on drm/sched), and servers/VMs that expose /dev/dri/* to user space or guests.
- Attack vector and privileges: Local — generally low privilege required in common desktop/container setups where DRM device nodes are accessible.
- Severity/score: Vendors and trackers classify the issue as Medium and availability‑first (CVSS base scores reported in vendor pages vary; some vendor pages reference a 5.8 or similar mid‑5.x score).
Distribution status varies:
- Ubuntu and SUSE have published advisories and marked the flaw fixed in kernel package revisions that include the upstream commits. Operators should consult their distribution security advisories for the exact package names and the patched kernel versions for their release.
- OSV, NVD, and other vulnerability mirrors list the CVE and the upstream commit references; some registries (NVD) were in the process of enrichment when the advisories were published.
- For vendor kernels and embedded / OEM devices, patch availability depends on the vendor’s backport schedule. Many small, surgical scheduler fixes are easy to backport but vendors must still choose to publish and distribute updated kernel images.
Note on Microsoft/Microsoft‑supplied images: public vendor attestations (CSAF/VEX style disclosures) for some upstream kernel CVEs often start with a limited product set (for example, Azure Linux). Operators running Microsoft images should consult the product‑specific VEX/attestation for the definitive mapping of impacted artifacts. Treat a vendor’s product attestation as authoritative only for the products explicitly listed; absence of an attestation for other products is not proof they are unaffected.
Exploitability, real‑world risk, and detection
Exploitability
This is an
availability (denial‑of‑service) defect, not a memory‑corruption or privilege‑escalation bug. The deadlock is deterministic when the triggering code path and concurrency align, which makes it attractive for local operators or tenants trying to disrupt a shared host.
Practical attack surface:
- Local processes with access to DRM device nodes (unprivileged compositors, sandboxed GPU helper processes, containers that mount /dev/dri) can often exercise these code paths.
- Multi‑tenant hosts, CI runners, GPU‑accelerated build nodes, or cloud VMs with GPU passthrough present the highest operational risk because untrusted tenants may be able to mount a denial of service against other tenants or the host.
There were no widely‑reported public exploit campaigns turning this into remote code execution; public notices treat it as a local DoS. However, the deterministic nature of the failure means it’s straightforward to weaponize as a disruption tactic if an attacker reaches the device interfaces.
Detection and telemetry
Add the following signals to kernel and monitoring rules:
- Kernel oops/panic traces that explicitly show DEADLOCK or call stacks referencing xa_lock, fence->lock, or drm_sched job dependency functions are canonical indicators.
- Repeated compositor crashes, GPU resets, or host instability correlated with container or untrusted user activity that touches /dev/dri.
- Journal or dmesg lines that show the interrupt‑unsafe locking trace (examples are included in vendor advisories and public CVE descriptions).
If a hang occurs, collect vmcore or kdump output before rebooting; these artifacts hold the kernel trace that confirms the deadlock and helps map whether the running kernel included the upstream fix.
Remediation and operational guidance
- Inventory: Identify which hosts run kernels that include drm/sched. Useful checks:
- uname -r to collect kernel versions.
- lsmod | grep drm and grep for CONFIG_DRM or CONFIG_DRM_SCHED in /proc/config.gz (if available).
- Inspect device node permissions: ls -l /dev/dri/* to see whether unprivileged processes or containers have access.
- Patch: Apply your distribution or vendor kernel updates that explicitly mention CVE‑2025‑40329 or the upstream commits. Kernel patches require reboots (unless a vendor-supplied livepatch exists and is offered for your kernel). Validate patched package versions in staging before fleet rollout.
- Short‑term mitigations (if immediate patching is impossible):
- Restrict access to DRM device nodes for untrusted users or containers (use udev rules, mount namespaces, or container runtime device policies).
- Avoid exposing /dev/dri to multi‑tenant or untrusted workloads and limit GPU passthrough to trusted tenants.
- Increase kernel logging and ensure OOPS traces are immediately collected to central logging for triage.
- Validate: After patching, run workloads that previously reproduced the issue in a test environment to confirm the deadlock no longer occurs and monitor logs for at least one maintenance cycle.
- Vendor coordination: For embedded devices and appliances, request an explicit backport from the vendor and ask for confirmation in writing that the specific fix (or equivalent) has been included. Many vendor kernels differ in configuration and backport strategy, so per‑artifacts verification is essential.
Critical analysis: strengths of the fix and residual risks
Strengths
- The fix is narrowly scoped and conceptually simple — moving lock‑sensitive work out of interrupt/callback context into a worker avoids the core class of lock inversions without wholesale redesign.
- Minimal surface area means the patch is easier to review, test, and backport; that translates into quicker distribution by vendors and shipping of stable kernel updates. Vendor advisories and CVE mirrors identified the upstream commits and mapped them into stable trees.
- The change addresses both the xa_lock irq‑safety issue and the nested fence lock ordering by deferring re‑arm logic to a context where locking order can be controlled, reducing the chance of regressions.
Residual risks and operational caveats
- Long tail and vendor lag: Embedded devices, vendor‑forked kernels, and OEM images often lag upstream. A surgical fix is easy to backport, but vendors may delay updates for stability testing, leaving devices exposed. Inventory and vendor follow‑up remain critical.
- Misconfiguration: The vulnerability’s exploitability often depends on device node exposure. Environments that inadvertently expose /dev/dri to containers or unprivileged processes will remain higher risk until patched and reconfigured.
- Incomplete mapping by vendors: Vendor attestations (for example product VEX files) sometimes cover only a subset of product families; absence of an attestation is not proof of absence. Customers must verify per‑artifact.
- Detection dependence: Many environments don’t collect kernel OOPS or vmcore artifacts centrally; without those traces, diagnosing a deadlock after a reboot can be difficult. Improve crash collection and retention policies.
Recommended prioritization for operations teams
- Priority 1 (high): Multi‑tenant hosts, GPU‑enabled cloud VMs with device passthrough, CI runners, and build servers that allow untrusted workloads access to GPU device nodes. These environments should be patched immediately or have DRM access restricted.
- Priority 2 (medium): Developer workstations and single‑tenant desktops that expose device nodes to multiple user accounts. Patch at the next maintenance window; consider interim access restrictions.
- Priority 3 (lower): Appliances and embedded devices where GPU drivers aren’t present or the kernel config omits drm/sched. Still verify vendor status and request backports if uncertain.
Conclusion
CVE‑2025‑40329 is a pragmatic, availability‑focused kernel vulnerability arising from a classic lock ordering and IRQ/context mismatch in the DRM scheduler. The upstream fix — shifting dependency iteration and re‑arming out of fence callbacks and into deferred work — is minimal, robust, and well suited to stable backports. For operators, the practical actions are clear: inventory affected kernels, apply vendor patches or backports promptly, and restrict untrusted access to DRM device nodes as an interim mitigation. Because the vulnerability is deterministic and easy to weaponize for denial‑of‑service in multi‑tenant contexts, prioritized remediation is warranted even though the flaw does not expose confidentiality or integrity vectors. If you manage systems that expose GPUs to untrusted workloads, treat this fix as operationally urgent: patch, validate, and harden access to device interfaces to remove the opportunity for local actors to induce kernel hangs.
Source: MSRC
Security Update Guide - Microsoft Security Response Center