Linux kernel maintainers have landed a focused regression fix for a 9P transport scheduling bug cataloged as CVE-2025-40305 that could hang 9P write paths when pipes become full; the short technical solution replaces a narrow EPOLLOUT-only check with a full poll multiplex invocation so the receive (RX) worker is reliably awakened on EPOLLIN conditions.
The Plan 9 (9P) protocol and its Linux in-tree client (commonly exposed as v9fs/9p) provide a lightweight network filesystem transport used in virtualized environments (for example, QEMU virtfs/virtio-9p), container toolchains, and some tooling that shares directories between hosts and guests. The 9P client implements asynchronous socket-like behavior for request/response semantics; correctness depends on precise wakeups and scheduling between readers, writers, and poll workers.
A recent optimization in the kernel I/O pipe path (commit aaec5a95d59615, which avoids waking writers when the pipe is still full) removed an earlier, albeit unnecessary, wakeup that previously triggered a chain of handlers which culminated in the 9P transport rescheduling its RX worker. In combination with how p9_read_work currently signals rescheduling, that optimization unintentionally removed the wakeup path for certain full-pipe scenarios, producing a stall in 9P writes. The regression was discovered and reproduced by kernel fuzzing infrastructure and has been fixed with a minimal, surgical change to the 9P FD transport.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background
The Plan 9 (9P) protocol and its Linux in-tree client (commonly exposed as v9fs/9p) provide a lightweight network filesystem transport used in virtualized environments (for example, QEMU virtfs/virtio-9p), container toolchains, and some tooling that shares directories between hosts and guests. The 9P client implements asynchronous socket-like behavior for request/response semantics; correctness depends on precise wakeups and scheduling between readers, writers, and poll workers.A recent optimization in the kernel I/O pipe path (commit aaec5a95d59615, which avoids waking writers when the pipe is still full) removed an earlier, albeit unnecessary, wakeup that previously triggered a chain of handlers which culminated in the 9P transport rescheduling its RX worker. In combination with how p9_read_work currently signals rescheduling, that optimization unintentionally removed the wakeup path for certain full-pipe scenarios, producing a stall in 9P writes. The regression was discovered and reproduced by kernel fuzzing infrastructure and has been fixed with a minimal, surgical change to the 9P FD transport.
What the bug is — technical anatomy
The observable symptom
Under the failing interleaving, hosts that rely on a 9P export (for example, guests using virtfs or hosts exposing 9P shares to guests) could see a complete stall of writes: once a pipe filled, subsequent writes would not progress because the RX worker was not being scheduled to consume data and restart the writer unblocking sequence. The failure mode is an availability problem: kernels do not crash or leak secrets in the commonly observed cases — they simply deadlock the 9P session until manual intervention or a reboot.Where the scheduling went wrong
- p9_read_work performs a sequence of operations that ends up calling p9_fd_read and ultimately anon_pipe_read when reading from a pipe.
- Before the pipe-read optimization (commit aaec5a95d59615), anon_pipe_read frequently caused a writer wakeup that propagated through p9_pollwake → p9_poll_workfn → p9_poll_mux, and when EPOLLIN was detected p9_poll_mux would schedule_work(&m->rq) to re-run the RX worker.
- After the optimization, that unnecessary wakeup was removed; p9_read_work no longer reliably sets the Rworksched flag nor does it always call schedule_work(&m->rq) when the request list is empty, meaning some conditions left the RX worker idle despite there being more data to process — i.e., the write side stalls because there is no worker to drain the pipe.
The fix in one sentence
Change p9_fd_request so that it consults the poll multiplexer (p9_poll_mux — which recognizes EPOLLIN and EPOLLOUT alike — rather than only performing an EPOLLOUT-only check; this restores the legitimate wakeup path and ensures the RX thread is kicked when EPOLLIN conditions require additional reads. The change is intentionally tiny and localized to the net/9p transport.Why this matters operationally
- The vulnerability is a regression in correctness and scheduling, not an outright memory-corruption exploit. Its practical impact is availability: hung 9P mounts and blocked I/O. In multi-tenant or CI environments where 9P is used for host–guest filesystem sharing, a stalled 9P channel may stall workloads and automation pipelines, producing outsized operational impact.
- The bug is local in the attacker model: an attacker must be able to execute code that exercises 9P pipes on a host or guest that uses 9P. In virtualized environments, that includes guest tenants mounting a host export via virtfs. That said, local vectors are significant in cloud and multi-tenant contexts.
- The fix is small and considered safe to backport. It has already been prepared for stable kernel trees and suggested for backporting into active kernel series so distributors can include it in kernel packages. The minimal footprint reduces regression risk compared with larger reworks.
Evidence, verification, and where to read the patch
Multiple vulnerability trackers and kernel-stable lists published the issue and the rationale for the change. The NVD entry for CVE‑2025‑40305 summarizes the regression and the solution: replacing an EPOLLOUT-only check with a poll-multiplexer invocation so EPOLLIN still triggers the RX scheduling path. The upstream stable patch and the stable update thread – which contains the diff and explanation for backporters – were posted to the kernel stable mailing list and stable update channels; the commit was addressed as a narrow one-line-plus-deletions change within net/9p/trans_fd.c. The stable-kernel announcement explains why the change is safe to backport and references the syzkaller reproducer that confirmed the stall. The OSV and SUSE advisories also list the CVE and restate the fix rationale; SUSE marked it as a new issue with no severity set at their page while OSV has an ingestion entry mapping the CVE to vendor trackers. These third-party trackers corroborate the technical narrative and the public patch references.Detection and hunting — what to look for in logs and telemetry
When triaging systems for this regression, prioritize operational signals that show I/O stalls and 9P transport activity:- Kernel logs: search dmesg and journalctl -k for signs of persistent 9P worker inactivity, unusual retry loops, or 9P-related WARN/OOPS stack traces referencing net/9p call frames. While this CVE primarily causes stalls, follow-up anomalies may be present in logs. Use targeted grep searches for 9P symbols if present. Our internal analysis and public advisories recommend scanning for v9fs and p9_call traces.
- Reproducible test behavior: in a staging environment with a 9P export and a workload that repeatedly writes to a pipe, a failing kernel will show the writer blocked with no corresponding progress on the receiver side. Fuzzing infrastructure (syzkaller) produced reproductions and was cited in upstream messages.
- Configuration and presence checks: identify systems mounting 9P: run findmnt -t 9p and check kernel modules with lsmod | grep -E '(^9p|9pnet|9pnet_virtio)'. Inventorying 9P usage helps prioritize hosts and images for remediation.
Mitigation and remediation: practical steps
The principal remediation is to install a kernel update that contains the stable commit fixing CVE‑2025‑40305 and then reboot into the patched kernel. Because this is a kernel-level scheduling regression, only updating the running kernel and restarting the component(s) will reliably restore correct behavior.- Immediate triage (fast, low-disruption)
- Identify hosts with active 9P mounts: run findmnt -t 9p and grep your CMDBs for virtfs/9p usage.
- If possible, isolate or migrate critical workloads off hosts that expose 9P to untrusted tenants until patched.
- If 9P is not required on a host, consider unloading or blacklisting 9P modules (9p, 9pnet, 9pnet_virtio) temporarily, subject to service impact testing. Blacklisting can be achieved by adding a /etc/modprobe.d/ file containing blacklist entries and rebooting. Note: unloading modules may disrupt VMs or guests that rely on virtfs.
- Definitive remediation (recommended)
- Apply the vendor/distribution kernel package that includes the stable backport for the net/9p fix (check your distro security tracker for the CVE mapping).
- Reboot into the updated kernel and validate 9P workloads in a controlled pilot ring before broad rollout.
- Re-run representative write/read cycles and the syzkaller reproducer if available to confirm the stall is resolved. The stable-kernel announcement includes a recommended syzkaller reproducer for verification.
- For custom kernels
- Merge the upstream stable commit(s) referenced in the advisories and rebuild your kernel. Verify the commit is present via git log --grep=<commit-id> and test the scenario before deploying. The patch is intentionally small; however, organizations that freeze kernel versions should still stage the change in test environments.
- Post-patch monitoring
- Watch kernel log aggregation for recurrence of the same scheduling signatures and for regressions. Add SIEM alerts for repeated high-frequency 9P write stalls, and treat repeated occurrences as high-priority.
Risk assessment and contextual analysis
Strengths of the fix
- The solution is surgical: it restores an earlier wakeup path by using the poll multiplexer, avoiding larger rearchitecting of the 9P transport.
- Because the change is small, it has a low regression surface and is straightforward for maintainers to backport into stable kernel trees. That has already happened in several stable updates and was promoted for distribution backports.
- The risk profile is constrained to availability; there is no public evidence that this exact regression enables remote code execution or privilege escalation in the general case.
Remaining and longer-tail risks
- Vendor lag: embedded systems, OEM kernels, and certain appliance images often delay upstream stable backports. These devices represent the long tail of exposure where a small regression like this can remain active for months. Organizations operating such devices should engage vendors for firmware or kernel updates or plan compensating mitigations.
- Operational impact of mitigations: blacklisting 9P modules or disallowing 9P mounts can break virtualization workflows or developer convenience environments. Mitigations must be evaluated against business needs and tested in staging.
- Detection limitations: because the bug presents as blocking behavior rather than a clear crash, it may be mistaken for application-level hangs, misconfiguration, or resource saturation. Centralized logging and targeted alerts for kernel-level 9P traces are essential to avoid false negatives.
How vendors and distributions are responding
Major distribution and security trackers (for example, SUSE, Debian, Ubuntu, and the OSV database) have ingested the CVE and listed it as a new issue; maintainers typically map the upstream stable commit(s) into kernel package updates and provide CVE-to-package mappings in their advisories. Operators should consult their distro’s security advisory to identify which binary kernel package version or kernel ABI contains the fix. The OSV and SUSE entries provide consistent summaries and point to the same upstream rationale for the fix. Stable-kernel maintainers recommended backporting the small fix to affected stable branches and provided a syzkaller-based reproducer to validate the backport. The upstream announcement explicitly marked the change as a regression fix and included a short rationale for the backport request.Practical checklist for sysadmins and SREs
- Inventory: Find systems using 9P (findmnt -t 9p). Flag hypervisors and images that mount host directories into guests (virtfs).
- Prioritize: Patch hosts that provide multi-tenant services, host CI runners, or run production virtualization workloads first.
- Patch path: Apply vendor-provided kernel packages that reference CVE‑2025‑40305 or include the referenced stable-kernel commit; reboot hosts in a staged fashion.
- Validate: Run representative 9P workloads and the syzkaller reproducer after patching. Confirm that blocked/written pipes are drained and that no residual stalls occur.
- Mitigate temporarily: Remove or blacklist 9P modules on hosts where 9P is not required, but test for guest impact first.
- Logging: Add search rules for p9_probe/p9_poll and caller functions; escalate kernel-level I/O stalls tied to 9P as high-severity incidents.
A few broader takeaways for WindowsForum readers
- Even small I/O optimizations in the kernel (for example, avoiding an unnecessary wakeup to reduce spurious writer wakeups) can have unexpected cross-layer consequences when subsystems relied on side effects rather than explicit contracts. The p9 case is a textbook example of how an optimization elsewhere (pipe_read) removed an implicit wakeup that 9P previously relied upon. The lesson: explicit synchronization beats implicit side-effects.
- Local attack surfaces matter in virtualized and multi-tenant cloud environments. Even if the CVE is not remotely exploitable in the common case, a local DoS primitive can be weaponized by tenants or insider threat actors to disrupt services — operators must treat local vectors with the same care they give to network-facing ones in shared environments.
- Small, well-scoped fixes are typically the fastest and safest path to remediation for correctness regressions. The p9 patch is an example of maintaining usability while minimizing risk of regressions from larger rewrites.
Conclusion
CVE‑2025‑40305 is a focused correctness regression in the Linux 9P transport that arises from an earlier pipe-read optimization. The vulnerability manifests as a complete stall of 9P writes when a pipe becomes full and the RX worker is not rescheduled; upstream maintainers addressed it by changing p9_fd_request to consult the poll multiplexer (p9_poll_mux, restoring EPOLLIN-driven wakeups. The patch is small and safe-to-backport, and distributions are expected to ship it in kernel updates. Operators should prioritize patching hosts that mount 9P or that run untrusted tenants that could exercise 9P paths, and in the short term consider module-blacklisting or mount restrictions where feasible until patching is complete. Applying the kernel update and validating with a staged rollout remains the recommended path; for environments that cannot patch promptly, temporary mitigation and careful monitoring are prudent interim steps. The incident is a reminder that correctness and explicit synchronization are paramount in filesystem and network transport code — micro-optimizations can ripple into macro-scale availability problems when implicit behavior is assumed.Source: MSRC Security Update Guide - Microsoft Security Response Center