Linux AMDGPU PF Passthrough Resume Fix Prevents GPU Page Faults

  • Thread Author
A narrowly targeted fix in the Linux AMDGPU DRM driver closes a deterministic kernel crash that could occur after hibernation on systems using PF (physical function) passthrough, a change that demands immediate attention from operators who run GPU‑exposed hosts, VMs or containers: the remediation ensures compute partition state is correctly restored on resume to prevent out‑of‑bounds MQD buffer access and the resulting GPU page faults.

Blue neon data center rack labeled HOST and GUEST, showing virtualization code and a CVE shield.Background​

The Linux kernel’s DRM/amdgpu driver coordinates complex hardware state across suspend/resume, GPU resets and virtualization boundaries. In PF passthrough environments — where a host’s physical GPU functions are passed through to guests or managed shared tenancy is in use — a mode1 reset that occurs during hibernation could leave partition state inconsistent after resume. When that partition state is wrong, the command processor (CP) may use an incorrect stride size while accessing an MQD (memory queue descriptor) BO, leading to out‑of‑bounds accesses and deterministic GPU page faults. The canonical fix calls the compute‑partition switch routine on resume so partition registers are restored to the expected state. This issue was published as CVE‑2025‑68230 on December 16, 2025, and the upstream change was cherry‑picked from a kernel commit identified in the CVE metadata. Multiple independent trackers and distribution advisories recorded the same technical narrative and recommended kernel updates or vendor backports.

Why this matters: impact and exposure​

  • Primary impact — Availability. The bug produces a deterministic GPU page fault after resuming from hibernation in PF passthrough scenarios, which in practice manifests as driver resets, compositor failures, persistent GPU errors or even host instability requiring reboot. These failures are costly in multi‑tenant services or production VDI/CI environments.
  • Attack surface — local and configuration dependent. The vector is local: an attacker or misbehaving workload must be able to exercise GPU operations on the host or guest that touch compute MQD state. On typical desktops the requirement is low because compositors or sandboxed GPU processes often go through the DRM path; on servers the exposure is highest where /dev/dri device nodes are mounted into containers, or where GPU passthrough gives guests direct hardware interaction. Shared infrastructure like CI runners, VDI hosts and cloud GPU instances are the highest‑priority targets for remediation.
  • Privilege escalation: unverified. There is no authoritative public evidence that CVE‑2025‑68230 by itself leads to privilege escalation or information disclosure. The practical risk is denial‑of‑service, but kernel memory/access faults are frequently included in multi‑bug exploit chains; treat deterministic kernel primitives seriously in hardening strategies.

Technical anatomy: what went wrong​

At resume after hibernation on PF passthrough systems, Mode1 reset occurs during hibernation but the driver did not reliably restore the GPU’s partition mode. Specific CP‑related registers — notably mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL — could remain in an incorrect state after resume. When the compute CP later accessed an MQD BO, it used the wrong stride calculation and read or wrote outside the MQD’s allocated bounds, triggering page faults or corrupted state. The upstream fix ensures the compute‑partition switch function gfx_v9_4_3_switch_compute_partition is invoked during the resume path to guarantee registers and partition mode match the runtime expectations. Two practical details operators should note:
  • Why only PF passthrough environments? Passthrough and virtualization alter the lifecycle of resets and resume sequences. Mode resets performed by the hypervisor or host during hibernation can leave the device hardware in a state the guest or host driver doesn’t anticipate unless the driver re‑applies partitioning logic on resume. That gap is what this fix closes.
  • Separation from KFD resume. Kernel Fusion Driver (KFD) resume logic runs independently in some sequences (reset recovery or normal suspend/resume); the upstream change clarified that the partition switch should happen on resume rather than being folded into a separate KFD resume path, preventing redundant or misplaced calls. This distinction helped produce a surgical patch that minimizes regressions.

What the upstream fix does​

  • Calls gfx_v9_4_3_switch_compute_partition during resume from hibernation to restore partition mode and register state used by the CP.
  • Leaves KFD resume semantics intact; KFD still performs its own resume tasks when appropriate.
  • The change was small and targeted, enabling easier backports to stable kernel branches and vendor kernels while reducing regression risk. The fix was discussed on kernel mailing lists and delivered as a cherry‑picked commit into the stable tree.

Who should prioritize patching​

  • Multi‑tenant and shared hosts that expose GPUs to untrusted workloads (GPU‑enabled CI runners, VDI servers, GPU clouds).
  • Systems using PF passthrough or device assignment where guests perform direct GPU compute.
  • Developer workstations and lab machines that run untrusted workloads or multiple user sessions with /dev/dri access.
  • Embedded appliances and vendor images that ship custom kernels; these often constitute the long‑tail risk because backports may be delayed.
If a host does not load the amdgpu driver or does not expose GPU device nodes to untrusted code, the immediate risk is lower — but confirm kernel versions and vendor mappings before assuming safety. Container images do not isolate kernel vulnerabilities; a container inherits the host kernel, so an unpatched host remains exposed regardless of container contents.

Immediate detection checklist (fast triage)​

  • Check the running kernel and module:
  • uname -r
  • lsmod | grep amdgpu
  • List DRM device nodes and permissions:
  • ls -l /dev/dri/*
  • Search kernel logs for GPU faults or register/partition messages:
  • journalctl -k --no-pager | grep -i amdgpu
  • dmesg | grep -E "CP|MQD|page fault|mmCP|XCP|amdgpu"
  • Look for reproducible symptoms on resume: coralgemm or compute workloads failing after resume/hybrid sleep, repeated driver resets, pageflip timeouts, or explicit page fault traces referencing amdgpu/CP register names. Preserve full journal/dmesg output and kdump serial console logs for vendor triage.

Recommended remediation: patching and verification​

  • Definitive fix: install a vendor or distribution kernel that contains the upstream commit(s) addressing CVE‑2025‑68230 and reboot into that kernel. Kernel fixes require a reboot to take effect. Upstream commit metadata indicates the change was included in stable branches; consult distro changelogs for the exact package versions.
  • If you build custom kernels: cherry‑pick the referenced upstream commit (the CVE record lists the commit hash metadata) into your branch, rebuild and run full smoke tests across representative hardware (multi‑display, docks, passthrough scenarios) before broad deployment.
  • Post‑patch verification:
  • Reboot the system into the patched kernel.
  • Reproduce representative hibernate/resume cycles with compute workloads such as coralgemm or other GPU compute tests that previously triggered faults.
  • Monitor kernel logs for several 24–72 hour cycles, focusing on amdgpu resets, pagefault messages, and CP register warnings. If no recurrence occurs within a reasonable validation window, mark the host mitigated.

Short‑term compensations when patching is delayed​

If you cannot immediately deploy patched kernels across your fleet, apply mitigations to reduce exposure:
  • Restrict access to DRM device nodes:
  • Use udev rules to set ownership to a trusted group and remove world access.
  • Remove untrusted users and automated CI accounts from video/render groups.
  • Avoid mounting /dev/dri into untrusted containers or CI runners.
  • Remove or disable PF passthrough for untrusted guests until the host kernel is patched. Where passthrough is essential, schedule patch windows and avoid hibernate cycles while hosts are exposed.
  • Harden container capabilities; drop CAP_SYS_ADMIN and similar privileges when running GPU workloads in containers, and prefer vendor‑supported GPU plugins that implement fine‑grained device access models rather than bind‑mounting /dev/dri.
  • Increase logging and SIEM detection:
  • Add alerts for repeated amdgpu resets, pageflip timeouts, or kernel oops lines containing amdgpu/CP/MQD keywords.
  • Preserve full oops traces and kdump output for vendor escalation.
These mitigations reduce the attack surface but do not remove the underlying bug; the only complete remediation is the patched kernel.

Practical triage and test recipes for operators​

  • Inventory:
  • Enumerate hosts that load the amdgpu module: lsmod | grep amdgpu.
  • Note which hosts expose /dev/dri to containers or untrusted users: grep -R "\/dev\/dri" /etc/docker /etc/containerd /etc/systemd -n.
  • Controlled repro (staging only):
  • Use representative hardware configured with PF passthrough or similar virtualization that reproduces the lifecycle (hibernate → resume).
  • Run coralgemm or equivalent compute jobs before and after hibernate and watch for pagefaults or driver resets.
  • Log capture:
  • Enable persistent journal logging and serial console captures for nodes where crashes may occur.
  • If an oops occurs, capture dmesg, journalctl -k, and a kdump capture if configured. These artifacts are essential for vendor mapping and for backport confirmation.
  • Validate patches:
  • Confirm vendor package changelogs explicitly reference the upstream commit or CVE identifier before declaring remediation complete.
  • On custom kernels, include commit ID in build metadata and cross‑check git logs to ensure the patch is present.

Vendor and distribution considerations​

Major distribution trackers and vendors mirrored the CVE and flagged the upstream commit references; however, package versions and timelines differ across distros and vendor kernels. Embedded and OEM kernels are the most likely to lag; device vendors or appliance vendors may need to be contacted for backports or firmware images incorporating the kernel fix. Operators should treat upstream stable commits as canonical but rely on vendor package mappings when applying production updates. For organizations that consume vendor images (marketplace images, cloud images), check the actual kernel artifact shipped in those images rather than assuming that the absence of a CVE note in a vendor portal means the code is not present — some vendor attestations cover only a subset of artifacts. Preserve an audit trail of kernel build IDs and vendor advisories for compliance.

Critical analysis: strengths, residual risks and caveats​

  • Strengths of the upstream approach. The remediation is small, surgical and behavior‑preserving: invoking the existing compute partition switch routine on resume corrects hardware state without heavy refactors. Small, focused fixes lower regression risk and are easier to backport. Kernel maintainers favored this approach, which increases the chance of quick distribution backports.
  • Residual risks. The long tail of vendor kernels and embedded devices poses the largest operational exposure. Systems where maintainers do not promptly backport stable commits will remain vulnerable. Operational misconfigurations — such as permissive /dev/dri exposure, widespread passthrough use in multi‑tenant CI, or insufficient log retention — substantially increase real‑world risk.
  • Exploitability nuance. While the CVE’s primary impact is availability (DoS), kernel faults in privileged drivers are valuable primitives for attackers. Although no public evidence indicates that this particular bug escalates privileges on its own, assume that deterministic kernel errors can be chained into more serious exploitation in the presence of other vulnerabilities or weak kernel hardening. Flag any claims of escalation without proof as unverified.
  • Operational trade‑offs. Short‑term mitigations (restricting /dev/dri, disabling passthrough) are practical but may disrupt legitimate workloads. Prioritize patches for hosts that present the greatest exposure and schedule maintenance windows for broader updates.

Recommended playbook (concise, prioritized)​

  • Inventory: Identify all hosts that load amdgpu and expose /dev/dri.
  • Patch: Apply vendor/distribution kernel updates that include the upstream fix and reboot. Confirm package changelog contains CVE‑2025‑68230 or the upstream commit ID.
  • Validate: Reproduce hibernate/resume workflows with representative compute workloads; monitor logs for 24–72 hours post‑patch.
  • Mitigate if delayed: Restrict /dev/dri access, remove passthrough from untrusted guests/containers, and increase SIEM alerts for amdgpu oops messages.
  • For vendor kernels: open support cases requesting backports or updated images and preserve crash artifacts for vendor triage.

Conclusion​

CVE‑2025‑68230 addresses a concrete, deterministic GPU page‑fault condition that arises in PF passthrough environments after hibernation because compute partition registers were not reliably restored on resume. The upstream remediation is small and safe: ensure the compute partition switch runs on resume to restore mmCP_* registers so MQD stride calculations and CP accesses are correct. For operators, the risk profile is straightforward: patch the kernel where possible, restrict GPU device exposure where patching is delayed, and validate resume cycles on representative hardware. Given the ease of triggering availability faults in shared GPU environments, prioritize updates for multi‑tenant hosts, GPU clouds, CI runners and any environment that exposes /dev/dri or uses passthrough. Multiple independent trackers and the upstream kernel commit history corroborate this diagnosis and remediation; follow vendor advisories and distribution changelogs for authoritative package versions and backport status.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top