A small but critical defensive check landed in the AMDGPU Atom interpreter this month: a missing allocation check in amdgpu_atom_execute_table_locked could leave the driver with a NULL "WS" workspace pointer while its size field was non-zero, creating a deterministic NULL‑dereference that can crash the kernel — the issue is tracked as CVE‑2025‑68190 and has been patched in the upstream stable trees.
The AMDGPU driver includes an AtomBIOS interpreter used to execute small firmware tables (mode-setting, clocks, and other hardware scripts). One of the interpreter’s helper functions, amdgpu_atom_execute_table_locked, allocates a temporary workspace (WS) when executing certain tables. If that allocation fails and the code does not handle the failure correctly, the interpreter later attempts to access WS entries and dereferences a NULL pointer inside atom_get_src_int, producing a kernel oops. This specific defect — failing to check the return of kcalloc and leaving ectx.ws NULL while ectx.ws_size is non-zero — was assigned CVE‑2025‑68190 and fixed with a minimal defensive change: return -ENOMEM on allocation failure rather than continuing with inconsistent state. Why this matters: kernel drivers run in privileged context, and even simple allocation-handling mistakes that in user space produce only a process crash can lead to system instability, repeated driver resets, or complete host reboots in kernel space. Graphics drivers are especially sensitive because they are exercised constantly by compositors, video players, and GPU-accelerated applications; a local process that can trigger the Atom interpreter under memory pressure can reliably cause a denial of service. Multiple downstream trackers have indexed the CVE and the upstream stable commits that remedied the bug.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
The AMDGPU driver includes an AtomBIOS interpreter used to execute small firmware tables (mode-setting, clocks, and other hardware scripts). One of the interpreter’s helper functions, amdgpu_atom_execute_table_locked, allocates a temporary workspace (WS) when executing certain tables. If that allocation fails and the code does not handle the failure correctly, the interpreter later attempts to access WS entries and dereferences a NULL pointer inside atom_get_src_int, producing a kernel oops. This specific defect — failing to check the return of kcalloc and leaving ectx.ws NULL while ectx.ws_size is non-zero — was assigned CVE‑2025‑68190 and fixed with a minimal defensive change: return -ENOMEM on allocation failure rather than continuing with inconsistent state. Why this matters: kernel drivers run in privileged context, and even simple allocation-handling mistakes that in user space produce only a process crash can lead to system instability, repeated driver resets, or complete host reboots in kernel space. Graphics drivers are especially sensitive because they are exercised constantly by compositors, video players, and GPU-accelerated applications; a local process that can trigger the Atom interpreter under memory pressure can reliably cause a denial of service. Multiple downstream trackers have indexed the CVE and the upstream stable commits that remedied the bug. Technical anatomy: what went wrong
The vulnerable pattern
- The interpreter receives a WS size (number of workspace entries).
- When WS is non-zero, the code allocates
ectx.ws = kcalloc(4, ws, GFP_KERNEL)(4 bytes * ws entries). - The old code set
ectx.ws_size = wseven if the allocation failed. - Later,
atom_get_src_intor equivalent interpreter routines indexectx.ws[idx]unconditionally whenectx.ws_sizeindicates entries exist. - If
ectx.wsis NULL, the indexed access becomes a NULL pointer dereference in kernel space.
The patch in practice
The upstream commit (stable trees) inserts an explicit check immediately afterkcalloc:- If
kcallocreturns NULL, setret = -ENOMEMand jump to cleanup (do not setectx.ws_sizeto a non-zero value). - If allocation succeeds, set
ectx.ws_size = wsand proceed normally.
Affected scope and exploitability
Who is at risk
- Desktop and laptop systems running Linux kernels that include the affected AMDGPU Atom code and that load the amdgpu driver.
- Shared or multi-tenant hosts (CI runners, VDI, kiosk machines, cloud images) that expose GPU device nodes (
/dev/dri/*) or allow untrusted workloads to interact with the GPU. - Virtualized hosts that use GPU passthrough or that grant guest VMs direct access to host GPU devices.
- Embedded appliances and vendor kernels that include AMDGPU and lag upstream backports — the "long tail" of unpatched embedded images is often the riskiest group.
Attack vector and privileges
- Attack vector: Local. An attacker must be able to execute code on the target host or cause an existing process (for example, a compositor or media player) to execute the interpreter path.
- Privileges: Often low in common desktop configurations where
/dev/dridevice nodes are accessible to user sessions or where compositors indirectly exercise the driver. - Complexity: Low. The crash is deterministic when the code path is exercised under allocation failure conditions; forcing an allocation failure can sometimes require memory pressure or careful sequencing, but on many systems the path can be reached by normal workloads combined with resource exhaustion.
Timeline and upstream coordination
- Patch submission and discussion appeared on the amdgpu/dri mailing lists and public patch archives in September 2025; upstream merge and stable-tree propagation occurred in the weeks that followed. The stable commit identifiers referenced in public CVE records include multiple hashes (upstream and stable merges) documenting the change.
- The CVE entry was published to NVD and mirrored by OSV and other vulnerability catalogs on 16 December 2025. The NVD description precisely mirrors the upstream commit message and notes the correct remediation (return -ENOMEM on allocation failure).
- Stable kernel branches (6.12, 6.17 and others) received the backported fix via the usual stable-review and autosel pipelines; distribution maintainers are expected to incorporate those stable commits into distribution kernel packages according to their release and backporting policies.
Operational detection and triage
When triaging whether a host has been affected or triggered by this bug, operational teams should:- Confirm whether the amdgpu driver is loaded:
lsmod | grep amdgpu. - Check device-node exposure and permissions:
ls -l /dev/dri/*. If device nodes are world-accessible or assigned to non-trusted groups, exposure is greater. - Search kernel logs for relevant oops traces and messages:
journalctl -k --no-pager | grep -i amdgpuanddmesg | tail -n 200. Kernel oopses referencing Atom interpreter symbols, NULL pointer dereferences, or messages tied to the Atom execution path are the primary forensic indicators. Preserve full kernel logs for vendor triage. - Reproduce in a controlled staging environment: exercise display reconfiguration, full‑screen video transitions, or workloads that trigger Atom tables under memory pressure to see if older kernels produce reproducible oopses. Use staged hibernation/resume or memory‑constrained VMs to provoke allocation failures when safe to do so.
Mitigations and remediation
Definitive remediation
- Install a kernel package that includes the upstream stable commit(s) fixing CVE‑2025‑68190 and reboot into that kernel. Kernel-space fixes take effect only after booting the updated kernel image. The upstream commit hashes are available in the public stable git history; distributions should list the CVE or commit in their package changelogs.
- For custom kernels, cherry‑pick the upstream commit (the Atom patch authored by Guangshuo Li and merged into the stable branches) into your kernel branch, rebuild, test on representative hardware, and deploy. Validate the fix against previously reproducible crash scenarios.
Short‑term compensating controls (when patching is delayed)
- Restrict access to DRM device nodes: use udev rules to bind
/dev/dri/*to a trusted group and remove world-access. Remove untrusted users and service accounts from thevideo/rendergroups. This reduces the ability for local untrusted processes to exercise the Atom interpreter. - Avoid granting GPU device passthrough to untrusted containers or VMs. If device passthrough is required, schedule patches and reboots, and limit the window of exposure.
- Harden container runtimes: drop unnecessary capabilities, avoid bind-mounting
/dev/driinto untrusted containers, and use vendor-supported GPU plugins that implement finer-grained access controls rather than exposing raw device nodes. - Increase telemetry and alerting for
amdgpuoops, repeated pageflip timeouts, or watchdog reset messages in SIEM: add alerts for repeated kernel oops lines containingamdgpuand preserve kdump output for analysis. These mitigations buy time before an orchestrated rolling patch and reboot.
Risk analysis: strengths of the fix, remaining concerns
Strengths
- The fix is minimal and defensive: return
-ENOMEMon a failedkcallocrather than leaving an inconsistent internal state. Minimal, surgical fixes in device drivers tend to have low regression risk and are straightforward to backport to stable trees and vendor kernels. The upstream maintainers explicitly chose this approach to avoid introducing broader changes that could destabilize diverse hardware topologies. - The issue is local-only and tied to a predictable allocation-failure scenario; that reduces the likelihood of immediate, large-scale remote exploitation. Public trackers and vendors classify the vulnerability as an availability-focused robustness bug rather than an information-disclosure or immediate elevation-of-privilege vector.
Remaining concerns and caveats
- Local DoS in shared environments is a real operational threat: multi-tenant GPU hosts, CI runners, and virtual desktops are attractive targets for denial-of-service because a local process can reliably crash the driver under the right conditions. Even if the numerical CVSS is moderate, the operational impact in shared infrastructure can be severe.
- Vendor and distribution lag: embedded devices and OEM kernel trees may not receive backports promptly. System integrators and appliance vendors must explicitly track the upstream commits and produce vendor kernel updates — otherwise fleets of devices can remain exposed for months or years. The usual "long tail" problem applies.
- Chaining risk: while there is no public proof-of-concept showing privilege escalation based solely on this defect, kernel memory-handling issues are a common starting point for multi-bug exploit chains. Treat deterministic kernel-oops primitives as high-priority to remove from exposed systems even when the immediate impact seems limited to availability. Mark any claims of escalation without validated PoCs as unverified.
Practical checklist for administrators (concrete steps)
- Inventory:
uname -randlsmod | grep amdgputo find hosts that load the driver. - Package mapping: Consult your distribution's security tracker and kernel package changelog for CVE‑2025‑68190 or the upstream stable commit hashes (check for the stable commits merged in 6.12/6.17 autosel updates). If a kernel package lists the commit hash or the CVE, schedule an update.
- Short-term hardening: enforce udev rules to restrict
/dev/dri/*, remove device passthrough from untrusted containers/GCE images, and limit access to GPU device groups. - Patch and reboot: apply vendor kernel update, reboot hosts, and verify patched status. Reproduce representative GPU workloads (modesets, multi-monitor hotplug, video playback) while monitoring
journalctl -kfor recurrence. - For custom kernels: cherry‑pick the upstream commit referenced in public stable git (see the commit IDs in the CVE/OSV entries), run kernel builds through hardware smoke tests, and stage rollout.
Conclusion
CVE‑2025‑68190 is a reminder that even small allocation-handling oversights in kernel subsystems can translate to impactful, deterministic crash primitives in privileged code. The good news is that the defect was identified and corrected with a straightforward defensive check upstream; the fix is already present in the kernel’s stable merges and should make its way into distribution kernel packages quickly. The operational focus now is standard: inventory affected systems, prioritize patching and rebooting exposed hosts (especially multi‑tenant GPU servers and CI/VDI hosts), and apply short‑term device-access hardening where immediate patching is impractical. Given the minimal and non-invasive nature of the patch, there is low risk in backporting or applying vendor updates; the real operational cost is the reboot window required to complete remediation.Source: MSRC Security Update Guide - Microsoft Security Response Center