The Linux kernel received a targeted fix for CVE-2025-40061 — a subtle race in the RDMA rxe driver’s worker loop that can lead to a use‑after‑free when tasks are being drained — and the patch restores pre‑migration draining semantics lost when the code moved from tasklets to workqueues. The change is small and surgical, but it fixes a genuine concurrency window that could cause a task to reschedule while cleanup code assumes it has stopped; system operators running RDMA/rxe‑enabled kernels should plan to apply the stable kernel updates or distribution backports as soon as practicable.
RDMA (Remote Direct Memory Access) support in Linux includes a software loopback implementation commonly called rxe (RDMA over Converged Ethernet, software emulation). That driver includes a worker function named do_task which processes work items according to a state machine controlling task execution, draining, and shutdown.
CVE-2025-40061 was introduced during a code refactor that migrated the driver’s scheduling and execution from tasklets to workqueues. During that migration a special-case check that handled draining tasks was accidentally dropped. The result: if do_task hit its iteration budget and attempted to reschedule itself without verifying that a concurrent cleanup path had already set the task into a draining state, the driver could erroneously return the task to active execution while cleanup proceeded — leading, in the worst case, to a use‑after‑free of task-related structures. The kernel community has accepted a small fix that restores the draining behavior by forcing an additional loop iteration when the task is in TASK_STATE_DRAINING so the task can properly reach TASK_STATE_DRAINED. This is a classic concurrency/synchronization bug with the following high‑level characteristics:
Separately, two cleanup paths — rxe_cleanup_task and rxe_disable_task — can set the worker’s state to TASK_STATE_DRAINING and then wait (in a while(!is_done(task) loop) for the task to finish before proceeding to tear down resources. Those waits release the spinlock protecting state changes while they poll or sleep waiting for the worker to reach completed state.
The race exists because:
Operators should treat this as a medium‑priority kernel robustness fix: high priority for RDMA‑enabled and virtualization/test environments, routine priority for general-purpose desktops that do not load rxe. The pragmatic remediation path is straightforward — apply the stable kernel update or the distribution backport, reboot, and verify that drain/unbind scenarios no longer produce kernel oopses. Maintain robust inventory and patch‑mapping practices so that kernel CVEs like this one are quickly correlated to the correct package and applied across the estate.
Caveat: public trackers and collectors sometimes lag by a few hours in mapping kernel commit IDs to specific distribution package versions; when automating remediation, always verify CVE→commit→package mappings against your vendor’s security advisory or the kernel stable commit log before scheduling mass rollouts.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
RDMA (Remote Direct Memory Access) support in Linux includes a software loopback implementation commonly called rxe (RDMA over Converged Ethernet, software emulation). That driver includes a worker function named do_task which processes work items according to a state machine controlling task execution, draining, and shutdown.CVE-2025-40061 was introduced during a code refactor that migrated the driver’s scheduling and execution from tasklets to workqueues. During that migration a special-case check that handled draining tasks was accidentally dropped. The result: if do_task hit its iteration budget and attempted to reschedule itself without verifying that a concurrent cleanup path had already set the task into a draining state, the driver could erroneously return the task to active execution while cleanup proceeded — leading, in the worst case, to a use‑after‑free of task-related structures. The kernel community has accepted a small fix that restores the draining behavior by forcing an additional loop iteration when the task is in TASK_STATE_DRAINING so the task can properly reach TASK_STATE_DRAINED. This is a classic concurrency/synchronization bug with the following high‑level characteristics:
- Introduced during a semantics‑preserving refactor (tasklets → workqueues).
- Root cause is an ordering/logic omission rather than a large design flaw.
- Fix is a defensive change that restores the original draining semantics.
- Impact is localized to systems that actually use the rxe code path (RDMA loopback and certain testing or virtualization setups).
Technical anatomy — what went wrong and how the patch fixes it
The vulnerable pattern
do_task implements a loop that processes work items and tracks an iteration budget. When the loop exhausts its budget (the variable commonly tested as !ret in the patch discussion), do_task will decide to reschedule itself by setting its internal state to TASK_STATE_IDLE and returning control so the workqueue can requeue the worker.Separately, two cleanup paths — rxe_cleanup_task and rxe_disable_task — can set the worker’s state to TASK_STATE_DRAINING and then wait (in a while(!is_done(task) loop) for the task to finish before proceeding to tear down resources. Those waits release the spinlock protecting state changes while they poll or sleep waiting for the worker to reach completed state.
The race exists because:
- State transitions are protected by a spinlock only for the brief modify operation.
- The cleanup paths explicitly release the lock while waiting for draining to finish.
- do_task, when hitting its iteration limit, sets state to TASK_STATE_IDLE without re-checking whether the state had been set to TASK_STATE_DRAINING in the interim.
- If the cleanup path thinks the worker is draining (and proceeds to free resources when it believes the worker has stopped) while do_task has reset state back to idle and resumes execution, the cleanup code can operate on freed structures or otherwise race with active worker actions — producing a use‑after‑free.
The fix (restoring pre‑migration semantics)
The upstream patch restores the original behavior present before the tasklet→workqueue migration: if do_task notices that the worker’s state is TASK_STATE_DRAINING when the iteration budget is exhausted, it should not set itself to idle and immediately reschedule. Instead the code sets a continuation flag (cont = 1) to force an additional loop iteration so the task can finish its remaining work and reach the state transition logic that puts the task into TASK_STATE_DRAINED. That guarantees the cleanup path’s wait sees the correct drained state and prevents the cleanup from racing with an erroneously rescheduled worker. The change is intentionally minimal: it retains the workqueue model and only reintroduces the draining-specific control flow lost during migration.Affected systems and exposure
Kernel and tree ranges
Public trackers and the NVD list the kernel references and stable commit IDs associated with the fix; the advisory metadata indicates the issue touches kernel commits present in affected stable branches prior to the backported patches. The NVD entry lists the kernel references and marks the vulnerability as resolved in snapshots that include the supplied stable commits. Administrators and packagers should map those commits to their distribution kernels to determine exposure.Real-world attack surface
The vulnerability is not a generic remote attack vector. Its exploitability requires the attacker to influence or be present in an environment where rxe code paths run and where the attacker can trigger cleanup/drain sequences concurrently with worker activity. Typical affected scenarios include:- Hosts and virtual guests used for RDMA testing, loopback RDMA setups, and environments that explicitly enable the rxe module.
- Development, test, or CI systems that use RDMA emulation for performance testing.
- Certain virtualization configurations where the rxe implementation is used as an emulated RDMA device.
Impact classification
Security and vendor trackers classify the issue as a memory‑corruption style vulnerability (potential use‑after‑free) rooted in a race condition. Public scoring tools estimate the severity in the medium range (examples show CVSS v3 values in the mid‑5.x to 6.x area depending on assumptions about attack vector and privileges), reflecting the local nature and the fact that exploitation requires precise timing/control. There is no authoritative public proof‑of‑concept (PoC) or report of in‑the‑wild exploitation at disclosure time; that does not mean exploitation is impossible, but it does mean defenders should prioritize patching proportionate to exposure.Detection — what to look for in logs and telemetry
Because this is a kernel‑level race that can surface as a use‑after‑free, detection focuses on kernel error traces, oops/panic reports, and unexpected crashes in RDMA or related subsystems. Useful signals include:- Kernel log entries (dmesg / journalctl -k) showing oops traces referencing the rxe driver or RDMA function names.
- Repeated driver restarts, warnings about use‑after‑free or memory corruption, and stack traces that include rxe task worker symbols.
- Crash dumps and oops traces that correlate to times when RDMA devices or the rxe module were unloaded or disabled.
- dmesg | grep -i rxe
- journalctl -k | grep -i 'use-after-free|oops|rxe'
- Look for stack traces that name do_task, rxe_cleanup_task, rxe_disable_task, or rxe worker state transitions.
Remediation and mitigation guidance
- Patch the kernel:
- Apply the upstream stable kernel updates that include the rxe fix. Upstream stable commit IDs are published in the NVD/OSV entries and in the kernel stable patch series; apply the smallest stable patch that contains the fix or upgrade to a kernel version where the fix is present.
- For distribution kernels, wait for your vendor package (Debian/Ubuntu/Red Hat/SUSE) to ship a backport and install the vendor update promptly. OSV and distribution trackers list downstream package mappings where available.
- Rebuild custom kernels:
- If you maintain custom kernel builds (embedded devices, appliances), merge the upstream stable commit(s) referenced by the advisory into your tree, rebuild, and validate in a test environment before rolling to production. The patches are small and low risk, but kernel testing and reboot windows are still required.
- If you cannot patch immediately:
- Remove or disable the rxe module (rmmod rxe) on hosts that do not require RDMA loopback functionality.
- Restrict who can load/unload kernel modules and who can perform device teardown operations — limiting unprivileged users’ ability to trigger the cleanup paths reduces the attack surface.
- Apply compensating controls: reduce exposure of RDMA hosts to untrusted users and tighten host-level privilege separation.
- Validate remediation:
- After updating kernels, reproduce previously observed attach/detach or drain scenarios on representative hardware and confirm the issue no longer appears in dmesg/journalctl logs.
- Monitor kernel logging for a window after deployment (7–14 days recommended) to detect regressions or lingering traces.
Practical patch rollout checklist (priority ordering)
- Inventory: identify hosts and VMs that have RDMA enabled or the rxe module loaded (lsmod | grep rxe). Check kernel versions (uname -r) and map them to distribution package versions.
- High-priority targets:
- RDMA-enabled compute nodes in high-performance computing (HPC) clusters.
- Virtual appliances and test beds that emulate RDMA via rxe.
- Custom appliances and vendor images that ship rxe in their kernels.
- Test: build the patched kernel in a non-production environment; run your RDMA/IO test suites and any workload that exercises draining/unbind behavior.
- Deploy: stage updates by host groups (test → pilot → broad), schedule reboots for kernel activation, and document rollback steps.
- Post-deploy monitoring: collect kernel logs, look for oopses, and validate behavioral stability.
Risk assessment — severity, exploitability, and operational impact
- Severity: Public trackers classify CVE-2025-40061 as medium overall. The weakness is a race that can evolve into a use‑after‑free; that class of bug can be severe if an attacker can reliably control timing and memory layout, but the required prerequisites limit broad remote exploitation.
- Exploitability: The vector is local and timing‑sensitive. Exploitation requires the attacker to cause concurrent drain/cleanup activity while exercising do_task and, in practical terms, a local primitive or foothold. Skilled exploit developers can turn timing primitives into practical escalations, but as of public disclosure there is no widely reported PoC. Treat proof‑of‑concept absence as a temporary comfort, not a permanent guarantee.
- Operational impact: The most likely immediate outcomes for an unpatched host are kernel oopses, driver crashes, and availability problems for RDMA services. In the worst case, a reproducible use‑after‑free could be turned into a memory corruption primitive usable for remote escalation if chained with other bugs — but that chaining increases complexity substantially. Prioritize remediation by exposure: production RDMA clusters and vendor appliances first, general desktops last.
Developer and maintainer guidance — lessons and testing recommendations
- Migrations change semantics: code refactors that swap scheduling primitives (tasklets → workqueues or other async models) must carry explicit regression tests for lifecycle and draining semantics. This bug is a textbook example of behavior lost during migration.
- Add unit/integration tests that simulate concurrent cleanup and draining sequences. Tests should verify that when a cleanup path sets TASK_STATE_DRAINING the worker never inadvertently reschedules and that the drained transition progresses to TASK_STATE_DRAINED correctly.
- Use kernel static analysis and concurrency review for state-machine code. Look for implicit assumptions that held under the old model but no longer hold after migration.
- Keep teardown/walkthrough checklists for drivers: teardown paths that wait for workers to stop should not hold locks while sleeping in ways that allow the worker to re-enter active execution without revalidation.
- For vendors shipping downstream kernels, make these patches part of the regular stable backport schedule and include change descriptions in release notes so operations teams can map CVEs to package versions quickly.
Cross-checking and verification
Multiple independent sources confirm the issue, the root cause, and the nature of the fix:- The NVD/OSV entries summarize the defect and reference the kernel stable commits that contain the patch. Those entries provide authoritative CVE metadata and affected‑tree guidance.
- The kernel stable mailing lists and patch stacks (spinics / stable tree patch notes) contain the commit message and discussion explaining the reintroduced draining behavior and the rationale for the change. These show the patch was reviewed and applied to stable branches.
- Security aggregators (Tenable, cvedetails, feed trackers) show consistent summaries and severity estimates; they also emphasize that the attack vector is local and that distribution backports will be the typical remediation path for end users.
Quick operational checklist (one page)
- Inventory:
- Run uname -r and lsmod | grep rxe across hosts.
- Identify kernels and distribution packages that map to the upstream commits referenced by the CVE entry.
- Patch:
- Deploy vendor kernel updates/backports that include the rxe fix; if you build kernels, merge the upstream stable commit(s).
- Short-term mitigations:
- Unload rxe where not needed, restrict module load/unload permissions, isolate RDMA hosts.
- Validation:
- Test drain/unbind sequences in a controlled lab and monitor dmesg/journalctl for oopses.
- Post-patch:
- Monitor for kernel oopses and log anomalies for at least two weeks after deployment.
Conclusion
CVE‑2025‑40061 is a targeted, well‑understood concurrency bug in the Linux RDMA rxe driver that was introduced by an incomplete semantics migration and fixed by restoring the pre‑migration behavior for draining workers. The technical fix is intentionally minimal and merged into the kernel stable trees; distributions and vendors are expected to incorporate the patch promptly.Operators should treat this as a medium‑priority kernel robustness fix: high priority for RDMA‑enabled and virtualization/test environments, routine priority for general-purpose desktops that do not load rxe. The pragmatic remediation path is straightforward — apply the stable kernel update or the distribution backport, reboot, and verify that drain/unbind scenarios no longer produce kernel oopses. Maintain robust inventory and patch‑mapping practices so that kernel CVEs like this one are quickly correlated to the correct package and applied across the estate.
Caveat: public trackers and collectors sometimes lag by a few hours in mapping kernel commit IDs to specific distribution package versions; when automating remediation, always verify CVE→commit→package mappings against your vendor’s security advisory or the kernel stable commit log before scheduling mass rollouts.
Source: MSRC Security Update Guide - Microsoft Security Response Center