CVE-2025-38371: Linux v3d interrupt race fix in kernel

  • Thread Author
A critical, low‑level kernel fix landed in mid‑2025 that patches a subtle race in the Linux DRM v3d driver: before resetting the GPU the driver must disable interrupts and ensure any in‑flight interrupt handlers have completed. The vulnerability, cataloged as CVE‑2025‑38371, describes a scenario where an interrupt can be serviced while the GPU is in the middle of a reset, producing GPU hangs and a kernel NULL‑pointer dereference in interrupt context — a fatal kernel oops on affected systems. This issue has been fixed upstream and backported into multiple stable kernel trees; operators running platforms that include the Broadcom VideoCore v3d driver (notably Raspberry Pi images and other boards using the v3d stack) should verify whether their kernels include the upstream commit and apply vendor patches where necessary.

Broadcom v3d GPU on a circuit board, bathed in orange circuitry with blue ripple glow.Background / Overview​

The v3d DRM driver supports Broadcom’s VideoCore GPU family (v3d) used in Raspberry Pi boards and certain single‑board computer designs. Unlike many DRM drivers written for discrete PCIe GPUs, v3d is an SoC GPU driver integrated into ARM platform kernels; its interrupt and reset sequences interact closely with platform firmware and hardware microcontrollers. The bug fixed by CVE‑2025‑38371 arises from a timing window during a GPU reset: if an interrupt fires while state is being torn down or if a reset races with IRQ handling, the interrupt handler can dereference now‑invalid pointers (or otherwise access partially reset structures), triggering a kernel crash. The publicly posted kernel oops in the vulnerability record shows a NULL pointer dereference in v3d_irq and a full kernel panic on a Raspberry Pi 4 test system. Why this matters operationally
  • The vulnerability is local in nature: it is triggered by kernel activity and device interrupts; it is not a remote network exploit.
  • The primary impact is availability: deterministic kernel oops, driver hang, or host instability that may require a reboot.
  • The practical exposure is highest where v3d is present and loaded (Raspberry Pi OS images, custom Debian/Ubuntu kernels built for Pi boards, vendor images that include the v3d module).

What went wrong — technical anatomy​

At a high level the problem is an interrupt-in-reset race. The kernel trace shown in the published advisories demonstrates the sequence:
  • The system initiates a GPU reset — part of the normal recovery when the driver detects a hang or during certain modeset/teardown operations.
  • During that reset window, the GPU (or its microcontroller) can still raise interrupts. The v3d interrupt handler (v3d_irq) may run while driver reset code is modifying or freeing structures that the interrupt code expects to exist.
  • The interrupt handler dereferences a pointer that has been cleared or reallocated, leading to a NULL pointer dereference in interrupt context. That abort is fatal in the kernel and produces the oops/panic shown in the dump.
Two subtle but important kernel realities make this class of bug severe:
  • Interrupt handlers run in a strict context: they cannot sleep and must be resilient to state changes elsewhere in the driver. If they touch data that another path can free or reinitialize, race conditions and oopses can follow.
  • GPU resets are heavy operations that might involve halting microcontrollers, clearing DMA mappings, or reinitializing ring buffers and state; if the interrupt path continues to run while that happens, its assumptions break. The fix is therefore to make the reset path synchronous with respect to IRQ handling: disable the device interrupts, then reset the GPU, and finally re‑enable interrupts only once handlers cannot be executing against torn‑down state.

The fix — what changed in the kernel​

The upstream commit (identified as 226862f50a7a88e4e4de9abbf36c64d19acd6fd0) hardens the reset routine by ensuring interrupts are disabled before the reset sequence begins and that any pending interrupt processing is drained or otherwise reconciled. In practical terms the patch does the following:
  • Calls the platform/driver IRQ disable path before beginning the reset sequence (so new IRQs cannot begin while state is being reinitialized).
  • Ensures any in‑flight interrupt work is accounted for (for example by using IRQ disable variants and synchronizing with the handler where appropriate) so that handlers will not touch freed or invalid objects.
  • Reorders reset teardown and reinitialization so the interrupt path has a consistent view of driver state during the entire sequence.
The change is intentionally small and surgical — a canonical kernel maintenance approach: make the minimum necessary change to close the window for a crash without redesigning large driver subsystems. That approach aids backporting to multiple stable kernel branches and reduces regression risk. The patch has been merged into upstream and added to several stable trees (5.4, 5.10, 5.15, 6.6, 6.15, and others), which indicates wide recognition of the fix and the desire to propagate it to long‑term kernels used by embedded images.

Affected systems and exposure profile​

Who is affected
  • Systems running kernels that include the v3d DRM driver and that have that driver active at runtime. Notably: Raspberry Pi OS images and other distributions packaging kernels for Raspberry Pi hardware. The public example in the advisory demonstrates the crash on a Raspberry Pi 4 Model B.
Where the patch landed
  • The upstream commit was accepted and has been added to multiple stable kernel trees; maintainers also applied stable backports, which means vendor kernels should be able to carry the fix into long‑term kernels. Distribution vendors (and downstream projects like SUSE in their advisories) mapped the CVE into package updates and errata. Operators should consult their distribution’s security tracker to find the exact package version that includes the upstream commit.
Exploitability and real‑world risk
  • The vulnerability is exploitable locally: an unprivileged user who can execute code on an affected host and trigger GPU/dri operations could provoke the crash. That exposure is significant in multi‑tenant systems, CI runners, or any environment that allows untrusted workloads to access GPU devices.
  • There was no authoritative public report of widespread in‑the‑wild exploitation at disclosure; however, local crash primitives are easily weaponized for denial‑of‑service against shared systems. Treat the absence of public exploit reports as absence of evidence, not evidence of absence.

Verification: how to confirm whether you're patched​

Operators should use an artifact‑level, reproducible verification workflow rather than trusting a vague statement that a distribution is “patched.” The recommended steps:
  • Identify Kernel and Module Presence
  • Confirm the running kernel: uname -r.
  • Check if the v3d module is present and in use: lsmod | grep v3d or check /sys/class/drm and /dev/dri.
  • Map your kernel to upstream commits
  • Check your distribution’s kernel changelog or package metadata for the upstream commit ID (226862f50a7a88e4e4de9abbf36c64d19acd6fd0 or the stable‑queue patch references). Vendor package changelogs commonly list backported commit IDs.
  • Look for vendor advisories and OSV/NVD entries
  • Use the distribution security tracker (or SUSE/RHEL/Ubuntu security pages) to find the CVE mapping to package versions. OSV and NVD entries include references and downstream advisories that can help you map package releases to fixes.
  • Validate behavior after patching
  • Boot into the patched kernel in a staging environment.
  • Reproduce representative GPU operations that previously caused instability (modesets, hot‑plugging, video playback).
  • Monitor dmesg and journalctl -k for v3d/DRM oops and interrupt failure traces.
If you cannot confirm the upstream commit is present in your kernel, treat the system as potentially vulnerable and apply compensating controls (see next section).

Mitigations and remediation guidance​

Definitive remediation
  • Install your distribution’s kernel update that includes the upstream stable commit addressing CVE‑2025‑38371 and reboot into the updated kernel. Kernel changes take full effect only after reboot. Upstream and stable‑queue messages list the commit and multiple stable branches where the patch is present; vendors should publish package-level mappings for their kernels.
Short‑term compensating controls (if immediate patching is impossible)
  • Restrict access to DRM device nodes: use udev rules and group permissions to remove /dev/dri access from untrusted users and containers. Prevent untrusted containers from binding /dev/dri into guest namespaces.
  • Avoid exposing GPU devices to multi‑tenant workloads unless necessary. For CI runners and shared hosts, remove device passthrough or use strict device allocation and quota policies.
  • Harden host monitoring: alert on kernel oops traces, repeated v3d or drm warnings, pageflip timeouts, or repeated watchdog resets. Collect persistent kernel logs for forensic analysis.
Operational rollout best practices
  • Inventory: enumerate which hosts have v3d loaded and which users/containers have /dev/dri access.
  • Staged rollout: apply updates to a test ring first; run representative display, DRM, and GPU workloads for at least 24–72 hours.
  • Monitor: collect dmesg/journalctl and enable central logging of kernel oops/panic events.
  • Vendor follow‑up: for devices using vendor‑supplied or OEM kernels (for example Raspberry Pi OS images or vendor appliances), request a vendor confirmation or specific package that includes the upstream fix and backports. Long‑tail kernel images often lag upstream and require explicit vendor action.

Why this patch is the right surgical fix — and its limitations​

Strengths
  • Small, targeted, and low‑risk: the change is a focused ordering/hardening fix that removes the crash window without redesigning driver infrastructure. That makes it easy to review and backport into stable LTS kernels.
  • Upstreamed widely: the patch has been accepted into multiple stable trees, increasing the chance downstream vendors will adopt it in their kernel packages.
Residual risks and caveats
  • Vendor lag / long tail: embedded appliances, vendor forks, and OEM kernels may remain vulnerable if maintainers don’t backport or ship updated kernel images. This is the most persistent operational risk.
  • Detection dependence: if hosts don’t centralize kernel oops logs or perform post‑mortem core collection, a kernel crash that forced a reboot may be hard to map back to this CVE later. Improve crash collection policy to close that gap.
  • Unverified escalation claims: public advisories frame the issue as availability‑first. Any claims that the bug enables privilege escalation require demonstrable PoCs. Until such evidence appears, treat those escalation claims as unverified.

Practical forensics — what to collect if you see a hit​

If a host exhibits symptoms (kernel oops, v3d IRQ trace, sudden panics), collect:
  • Persistent kernel logs: journalctl -k --no‑pager and /var/log/kern.log.
  • dmesg output and any serial console logs (these are crucial for kernel oops stacks).
  • uname -a and the kernel package version/changelog of the running kernel.
  • lsmod | grep v3d and the output of ls -l /dev/dri/* to show device exposure.
  • Any reproducer steps or workload patterns that produced the crash (hot‑plugging, specific media playback, container workloads). Preservation of logs before reboot is vital; rebooting without saving logs can lose the stack trace needed to map the crash to the upstream commit.

Cross‑checks and independent confirmation​

The public kernel announcement and stable‑queue postings provide the authoritative technical commit and reasoning; the NVD/OSV entries carry the CVE record and references back to the kernel commit. Independent stable patches and distribution advisories confirm the mapping and the presence of the fix in multiple kernel branches. Use at least two independent sources (for example the upstream stable commit announcements and the NVD/OSV CVE entries) to verify whether a given kernel package includes the change. Caveat about vendor attestations
  • Some vendor CDS/VEX/CSAF attestations may initially list a specific image (for example, Azure Linux or a particular distribution image) as affected while other vendor images are still under investigation. Treat vendor attestation lists as a helpful starting point, but validate per artifact and package.

Executive summary and prioritized action checklist​

Security impact summary
  • CVE‑2025‑38371 is a local, availability‑first kernel vulnerability in the Linux drm/v3d driver that can lead to a fatal kernel oops during interrupt handling if an IRQ fires while the GPU is being reset. A small, surgical commit disables interrupts before reset and ensures in‑flight handlers don’t race with reset teardown.
Immediate (hours)
  • Inventory hosts running v3d and identify which ones expose /dev/dri to untrusted users or containers.
  • Apply temporary access controls: restrict /dev/dri to trusted groups; remove device passthrough from untrusted containers.
Short term (days)
  • Obtain and deploy vendor/distribution kernel updates that include the upstream commit; schedule reboots to activate the fix.
  • Validate in staging: exercise display/modeset/hot‑plug scenarios and monitor kernel logs for recurrence.
Medium term (weeks)
  • Ensure long‑tail devices (embedded appliances, vendor builds) are tracked and that vendors provide backports or updated firmware/kernel images.
  • Centralize kernel oops/log collection and retention policies so future incidents can be triaged effectively.
Long term
  • Maintain a clear pipeline for tracking upstream commits for kernel drivers on embedded platforms and automate package change detection that maps upstream commit IDs to distribution kernel packages.

Conclusion​

CVE‑2025‑38371 is a pragmatic example of a kernel robustness fix: a narrow, well‑scoped change that prevents a subtle interrupt/reset race in the v3d DRM driver. The patch’s surgical nature makes it straightforward to backport, and it has already been merged into multiple stable kernel trees, but operational risk remains for environments that use vendor‑supplied kernels or images that lag upstream. Administrators running Raspberry Pi systems or other SoC platforms with v3d should prioritize confirming the presence of the upstream fix in their kernel packages, apply vendor updates, restrict GPU device exposure for untrusted workloads, and centralize crash logging to speed detection and triage if incidents occur.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top