Linux zswap UAF CVE-2025-21693: CPU hotplug fix with mutex

  • Thread Author
The Linux kernel has a newly cataloged use‑after‑free in the zswap compression path—tracked as CVE‑2025‑21693—that can be triggered when a CPU is hot‑unplugged while compression or decompression is still using per‑CPU resources, allowing those resources to be freed under active use and producing kernel oopses or other availability failures.

Neon schematic of per-CPU buffers and zswap compression, with CPU hotplug and UAF symbols.Background​

What is zswap and why per‑CPU state matters​

zswap is a compressed cache for swap pages that lives in memory to avoid going to slower swap devices; it compresses pages in‑place and keeps compressed pages in RAM to reduce IO and improve responsiveness under memory pressure. Per‑CPU contexts and hardware acceleration hooks are used to optimize compression and decompression throughput: the kernel keeps a small per‑CPU acomp_ctx structure to carry temporary buffers, request handles and acceleration state so that hot, parallel workloads don’t contend on a single global state object.
When kernel code switches from an API that required disabling preemption (get_cpu_ptr to one that allows sleepable contexts (crypto_acomp), lifetime and migration semantics change: you can no longer rely on preemption/migration being disabled to guarantee a given per‑CPU context stays valid for the duration of an operation. That change is the practical root of the CVE discussed below.

Timeline and discovery​

The vulnerability and its remedy were discussed on the kernel mailing lists in January 2025 and were assigned CVE‑2025‑21693 after upstream maintainers reviewed the zswap code changes that introduced the incorrect assumption. Public vulnerability databases (NVD, Debian, Ubuntu and vendor trackers) have recorded the issue and cataloged the fix description and affected code paths.

Technical analysis: what went wrong​

Root cause — a subtle lifetime race with hot unplug​

  • In zswap_compress and zswap_decompress, zswap retrieves the per‑CPU acomp_ctx for the CPU that started the operation and uses that context through the operation.
  • Because the crypto_acomp API requires a sleepable context, zswap no longer uses get_cpu_ptr (which disables preemption) to pin the per‑CPU context. That means the compression/decompression thread may migrate to another CPU while still referring to the original acomp_ctx.
  • If the original CPU is hot‑unplugged while acomp_ctx is still in use, the CPU hotunplug callback frees resources attached to that acomp_ctx—specifically acomp_ctx.buffer, acomp_ctx.req and acomp_ctx.acomp—via zswap_cpu_comp_dead, producing a use‑after‑free (UAF) when the still‑running compression/decompression code accesses them afterward.
This is a classic concurrency-lifetime bug that arises when migration or CPU hotplug operations can outlive the lifetime assumptions of per‑CPU transient resources.

Which commit introduced the behavior​

The regression is traced to the commit 1ec3b5fe6eec, which switched zswap to use the crypto_acomp API for hardware acceleration. That change removed the implicit lifetime guarantee that get_cpu_ptr previously provided, and a later change (commit 8ba2f844f050) made per‑CPU buffers dynamic, increasing the surface area for the race. The mailing list discussion and CVE summaries document that lineage.

Why common fixes were rejected or problematic​

Multiple fix strategies were considered and discussed:
  • Disabling migration via migrate_disable around compression/decompression would reintroduce the old invariant but is heavyweight and can introduce performance problems or deadlocks in complex kernel paths.
  • Holding cpus_read_lock during compression was tried but can deadlock: code paths that already hold that lock may call into reclaim and re-enter zswap, creating a potential circular wait.
  • Using SRCU looked promising (SRCU lets readers sleep while allowing safe CPU offlining), but SRCU has restrictions when used in CPU hotplug callbacks.
After weighing the tradeoffs, maintainers agreed on a synchronization approach that is safe in the hotplug context and minimal in scope. The chosen fix adds explicit mutual exclusion and lifecycle checks around the per‑CPU acomp_ctx resources, and relocates mutex initialization to a place tied to the pool allocation so the mutex cannot be reinitialized in racey hotplug scenarios.

The upstream fix — what changed​

The stable fix implemented the following concrete measures:
  • Protect resources freed on CPU hotunplug with acomp_ctx.mutex: CPU hotunplug callbacks acquire the mutex before freeing acomp_ctx.buffer, acomp_ctx.req or acomp_ctx.acomp, and set acomp_ctx.req to NULL to mark the resource as gone.
  • Compression/decompression code acquires acomp_ctx.mutex before using acomp_ctx.*, and after acquiring the lock checks whether acomp_ctx.req is NULL (indicating the CPU was offlined and its resources freed). If acomp_ctx.req is NULL the code retries using the acomp_ctx for the new/current CPU.
  • The initialization of acomp_ctx.mutex is moved from the CPU hotplug callback into the pool initialization code (zswap_pool_create, ensuring the mutex exists and is initialized for the entire lifetime of the per‑CPU context and cannot be reinitialized by hotplug handlers while someone else holds it.
This is a surgical fix: it preserves the performance model of per‑CPU contexts while adding correct synchronization to prevent freeing resources that are still in use.

Impact and exploitability​

Attack surface and prerequisites​

  • Vector: Local-only. An attacker needs to be able to run code on the host (or as a privileged guest or container) that exercises zswap compression/decompression while orchestrating or triggering CPU hotunplug events.
  • Preconditions: zswap must be active, and the kernel must contain the commit range that introduced the crypto_acomp change without the synchronization fix. Systems with dynamic CPU management (power‑saving servers, some virtualized hosts) and setups that enable hardware compression acceleration are most likely to exercise the affected code paths.
Multiple vendor advisories and NVD classify the impact as an availability issue: repeated triggering can produce kernel oopses, soft/hard faults and host instability. Ubuntu lists a CVSS‑derived metric consistent with high severity in availability impact; other vendor ratings vary by distribution and kernel packaging.

What an attacker can do​

This is principally an availability primitive:
  • Repeatedly triggering the race may crash the kernel or force host reboots, producing a denial‑of‑service for local services or multi‑tenant workloads.
  • There is no authoritative public proof that this UAF leads to privilege escalation or remote code execution; turning such a UAF into a reliable memory-corruption primitive that yields arbitrary code execution would require additional, platform-specific conditions and is speculative at disclosure. Treat escalation claims as unverified until a PoC or vendor incident report shows otherwise.

Real‑world exposure and vendor responses​

  • Most mainstream distributions and upstream kernel trees received the patch as a small, low‑risk backport; operators should expect vendor kernel packages to include the fix in their normal stable updates.
  • Some vendors may classify the operational risk differently. For example, vendor advisories such as Amazon Linux evaluated the issue and judged exploitation to be unlikely enough that they did not plan to backport fixes for certain kernel channels, citing operational risk for backport stability. That highlights the necessity of cross‑checking your distribution’s advisory rather than presuming uniform patch availability.

Detection, telemetry and incident response​

Indicators to hunt for​

  • Kernel oops traces that include zswap functions or stack frames referencing zswap_cpu_comp_dead, zswap_compress, zswap_decompress or per‑CPU acomp_ctx usage.
  • KASAN reports or slab-use-after-free traces that reference zswap buffers or crypto_acomp contexts.
  • Unexpected host reboots, soft lockups, or increased watchdog messages coincident with CPU online/offline operations or power management events.

Forensic steps after an event​

  • Preserve dmesg and /var/log/kern.log or journalctl -k output before rebooting.
  • Collect crashdump (kdump) images and correlate stack traces to identify whether the root is a freed acomp_ctx resource.
  • Identify recent CPU hotplug or power management events (systemd operations, echo commands to /sys/devices/system/cpu/*/online) and correlate timing.
  • If the host is multi‑tenant, try to correlate the offending workload to a specific container or VM and isolate the tenant until the host is patched.

Remediation guidance and operational playbook​

Definitive fix​

  • Install a kernel package from your vendor or distribution that includes the upstream stable commit(s) that implement the synchronization fix, then reboot the host into the patched kernel. Because this is kernel‑level code, a reboot is required for remediation to take effect. Verify vendor changelogs for commit IDs or explicit CVE references before marking hosts as remediated.

Prioritization​

  • Highest priority: multi‑tenant hosts, cloud hypervisors, CI runners and shared build boxes where an untrusted process could trigger zswap activity and a hotplug event could be initiated (intentionally or by system power management).
  • Medium priority: long‑running servers that use zswap and dynamic power management.
  • Lower immediate priority: single‑user desktops where workloads are trusted and CPU hotplug is uncommon, but patching is still recommended.

Short‑term mitigations (if you cannot patch immediately)​

  • Disable zswap where practical: echo 0 > /sys/module/zswap/parameters/enabled or remove the kernel parameter that enables zswap at boot. Note: disabling zswap may increase swap IO and reduce performance under memory pressure—test the impact before broad rollout.
  • Avoid CPU hotplug operations and dynamic CPU online/offline changes while running high‑compression workloads.
  • If hardware crypto acceleration is the path used, consider disabling it so the old get_cpu_ptr-style assumptions are not relevant (this is environment specific and may reduce throughput).
  • Isolate untrusted workloads on hosts that are already patched; do not allow unvetted tenants on vulnerable hosts.

How to verify a kernel contains the fix​

  • Check the kernel package changelog or distribution advisory for CVE‑2025‑21693 or the patch discussion that references moving the acomp_ctx.mutex initialization and checks for acomp_ctx.req being NULL.
  • If you build kernels in‑house, inspect mm/zswap.c for the mutex acquisition and NULL‑check/retry logic around acomp_ctx.req and confirm mutex_init is called at pool initialization rather than in CPU hotplug callbacks.
  • Perform a staged test: in an isolated lab, try to reproduce the hotplug + compress/decompress sequence that previously triggered the issue; the patched kernel should no longer produce UAF or oopses under the same test conditions.

Why the fix is reasonable — strengths and residual risks​

Strengths​

  • The upstream patch is narrow and surgical: it adds locking and a defensive NULL check and re‑try semantics rather than heavy, global changes. That keeps regression risk low and makes backporting to multiple stable kernel branches straightforward.
  • By moving mutex initialization to pool creation, the patch eliminates a reinitialization race that could otherwise leave a mutex in an inconsistent state during hotplug operations.
  • The approach preserves zswap’s performance profile in normal cases while robustly guarding the small window of CPU offlining concurrency.

Potential caveats and residual attack surface​

  • The vulnerability is an availability issue but UAFs are always a cautionary flag: complex kernel UAFs have historically been chained into escalation primitives in rarer, architecture and allocator‑dependent exploit chains. No public PoC demonstrates escalation here, but defenders should treat the UAF as operationally significant in shared environments.
  • Vendor timelines vary. Embedded and OEM kernels (appliances, Android images, OEM devices) often have a long patch tail; these devices may remain exposed longer than mainstream distribution kernels. Verify vendor advisories for each device family.

Practical checklist for administrators​

  • Inventory
  • Find hosts where zswap is enabled: check /sys/module/zswap/parameters/enabled or kernel boot parameters.
  • Identify hosts with dynamic CPU management (power saving, orchestration that uses CPU online/offline).
  • Map kernel package versions to distribution security tracker entries for CVE‑2025‑21693.
  • Acquire and stage patches
  • Obtain vendor kernel packages that explicitly reference CVE‑2025‑21693 or the upstream stable commit IDs.
  • Build a test image and stage a pilot update on non‑production hosts.
  • Test and validate
  • Reproduce prior triggers in a controlled lab (compression/decompression workloads plus CPU hotplug) to ensure the fix prevents oopses.
  • Verify kernel changelogs list the fix, then roll out in waves with monitoring windows.
  • Monitor
  • Add SIEM or log alerts to watch for kernel traces referencing zswap or zswap_cpu_comp_dead.
  • Watch for KASAN reports and kernel oops patterns after rollout.
  • Contingency
  • If a vendor package is not available, consider compiling a stable kernel with the upstream patch applied and test carefully before production rollout.

Broader context: why this class of bug matters​

This CVE is part of a broader family of kernel correctness issues that arise when micro‑optimizations that rely on per‑CPU state interact with CPU hotplug or migration semantics. Similar mm and driver concurrency issues have produced kernel oopses, hangs and, in some rare chains, escalations. Operationally, availability problems caused by kernel races can be as damaging as data‑theft vulnerabilities in multi‑tenant environments, because a host crash affects every tenant and service running on that host. For that reason, kernel correctness fixes that close UAF windows—even if they appear niche—should be treated as operationally significant for shared infrastructure.

Conclusion​

CVE‑2025‑21693 is a real, actionable availability bug in the Linux kernel’s zswap compression path caused by changing lifetime assumptions when switching to a sleepable crypto_acomp API. Upstream maintainers chose a pragmatic, surgical synchronization fix—mutex protection around per‑CPU acomp_ctx resources plus lifecycle checks and a safer mutex initialization point—to close the race without large performance regressions. Operators running kernels that include the vulnerable commit range should treat this as a priority for multi‑tenant and hotplug‑heavy environments: obtain vendor kernel updates that reference the fix, reboot hosts into patched kernels, and, if immediate patching is not possible, apply compensating controls such as disabling zswap or avoiding CPU hotplug operations until a tested patch is deployed.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top