Linux AF_UNIX Race Fixed: Kernel Lock Prevents Use-After-Free in unix_stream_sendpage

  • Thread Author
A subtle race in the Linux kernel's AF_UNIX code that allowed a kernel function to follow a freed pointer has been patched — the fix closes a null-pointer / use-after-free window in unix_stream_sendpage that could be triggered by carefully crafted local socket operations and file-descriptor passing.

Background​

Unix domain sockets (AF_UNIX) are the glue for high-performance local IPC on Linux, used by system daemons, desktop services, container runtimes, and countless applications. They support ancillary data semantics — most notably the ability to pass open file descriptors between processes — and that complexity increases the surface where subtle races and lifecycle bugs can appear.
In late 2023–2025 a race condition in the AF_UNIX stream send path was reported and subsequently fixed. The flaw centers on unix_stream_sendpage, which tried to append data to the last skb (socket buffer) on the peer socket's receive queue without taking the peer's receive-queue lock. Under a particular sequence that involves passing file descriptors between sockets creating a loop, and then letting the sockets be closed and cleaned up by the kernel's garbage collection, the sendpath could race with cleanup and dereference freed memory — a classic use-after-free/null-pointer-deref scenario.

Technical overview​

How the pieces fit: skb, recv queues, and FD passing​

  • skb (socket buffer) is the kernel structure that holds packet/stream data queued on a socket; AF_UNIX uses skb to queue received messages.
  • Peer recv queue is where a socket holds the incoming skbs destined for that socket; producers append skbs, consumers (readers) dequeue them.
  • Ancillary FD passing allows a process to send an open file descriptor over an AF_UNIX socket; the kernel must track and manage the lifecycle of those FD-bearing skbs.
  • Garbage collection for sockets is invoked to cleanup complicated reference cycles — for example when file descriptors are passed in a way that creates loops between sockets and none of the participants hold a live reference to break the cycle.
unix_stream_sendpage participates in composing and appending data to the peer's receive queue. The problematic behavior was that it attempted to update or add to the last skb in the peer's recv queue without acquiring the queue lock. That operation is fundamentally racing with other threads that may be unlinking and freeing skbs — particularly the garbage collector that walks and reparents/unlinks skbs holding FDs under protection of the recv-queue lock. The result: a code path that reads or writes into an skb that may be concurrently freed, causing a use-after-free and resulting in kernel memory corruption or a NULL-deref crash.

The triggering scenario (high level)​

  • Create two AF_UNIX stream sockets (A and B).
  • Pass A's FD to B, and B's FD to A, creating a looped reference where each socket holds references to the other via passed FDs.
  • Close both sockets without consuming the queued fd messages — the kernel's orphan/garbage collector will walk this structure to unlink and free skbs containing FDs under the queue lock.
  • Concurrently call a sendpage (or a send API that triggers unix_stream_sendpage that attempts to append into the last skb on the peer's queue without locking.
  • The sendpath races with the garbage collector: sendpath accesses an skb which the garbage collector may free, leading to use-after-free or NULL pointer dereference when the freed structure is accessed. Security researchers reported a concise reproduction used to exercise the race.

The fix: lock the peer recv queue​

The kernel maintainers applied a targeted fix: ensure the peer's receive queue is locked while unix_stream_sendpage inspects or appends to the last skb. This prevents the sendpath from touching an skb that another thread (garbage collector or cleanup logic) may be unlinking and freeing concurrently.
That patch was backported into stable kernel trees and appears in several distribution advisories; upstream notes also point out that the issue does not exist in 6.5+ because a broader refactor of sendpage changed the sendpath semantics in a way that avoided the race. The patch is attributed to a small, well-contained locking insertion authored during stabilization work, and the original writeup and reproduction came from security researcher Bing‑Jhong Billy Jheng.

Affected versions and vendor guidance​

  • Kernels that retained the older sendpage path (generally older stable series such as 5.x and 6.1/6.4 series prior to refactors) are the primary targets for this bug; vendor advisories and stable-tree commits show backports into long-term stable branches.
  • The issue was noted as being addressed in later trees (6.5 and newer) by the sendpage refactor, but many distributions ship older stable kernels, so vendor patches and backports remain important.
  • Distribution advisories and security trackers began listing the CVE and publishing kernel package updates. Some vendors rated the issue as "Important" and assigned CVSS scores in the 7.0 range for vendor-specific CVE entries, but ratings vary by vendor and by the exact package/kernel version shipped. Administrators should follow their distribution’s advisory for the authoritative status of a given package.

Exploitability and real-world risk​

  • Local-only requirement: This vulnerability is a local kernel bug — exploitation requires running code on the same host and the ability to open and manipulate AF_UNIX sockets and pass FDs. It is not remotely exploitable via network packets alone. This limits the attacker model to local privilege escalation or sandbox escape scenarios where an attacker already has user access.
  • Complex reproduction but feasible: The reproduction involves FD passing and orchestrated closure; security researchers produced a working repro which indicates the race is reasonably practical to trigger in a controlled environment. That increases concern for hostile local actors or chained exploits (for example, an unprivileged process exploiting this to escalate privileges when combined with other system misconfigurations).
  • No widespread public exploit evidence at time of reporting: Public references discuss the vulnerability, the repro, and patches; no broad, reliable evidence of in-the-wild privileged exploitation campaigns was present in the disclosure notes consulted. That said, once a reliable PoC exists, the time-to-exploit in the wild can shrink — especially for local attacks. This status can change quickly and should be monitored via vendor advisories. Treat "no public exploit seen" as a snapshot, not a permanent guarantee.

Patch details and backporting concerns​

  • The fix is conceptually simple: acquire the peer's recv-queue lock before accessing the last skb and make the relevant pointer accesses under protection. Simplicity helps in review and reduces regression risk, but kernel locking changes require care: adding locks can introduce deadlocks or subtle ordering problems if not integrated carefully with existing locking hierarchy.
  • Kernel stable trees received backports of the patch; vendors pushed updates into their distribution kernels. Administrators should apply vendor packages rather than attempt ad-hoc local kernel patching unless they have a tested workflow.
  • For environments that cannot immediately update the kernel, mitigations are limited: restricting untrusted local users from creating AF_UNIX sockets used by privileged daemons, reducing SUID/SGID surfaces, and hardening sandbox boundaries are defensive steps but may not be practical for many setups. Where feasible, use mechanisms like MAC (AppArmor/SELinux) policies to restrict which processes can open or pass file descriptors over privileged AF_UNIX endpoints.

Practical remediation checklist​

  • Identify kernel versions in use with: uname -r (note: exact package and distribution matter).
  • Check your distribution’s security advisory feed for the CVE label and the patched kernel package name.
  • Apply vendor-supplied kernel updates on a staged system first; reboot systems into the updated kernel following standard patch windows.
  • If vendor packages are delayed, evaluate backporting the stable-tree patch — only if you have kernel build/test experience and can validate no regressions in your environment.
  • In container or multi-tenant deployments, limit untrusted container privileges and limit /var/run/docker.sock‑like AF_UNIX interfaces to trusted workloads only.
  • Monitor for updates from your vendor, and track public vulnerability databases for emergence of PoCs or exploitation reports.

Developer and maintainer notes: performance and regression trade-offs​

  • Adding an extra lock in a hot sendpath can raise performance eyebrows. However, the targeted lock here protects a pointer to the last skb on the peer queue — a correctness-first decision was made to avoid a use-after-free. For most real-world AF_UNIX workloads this overhead is negligible compared with the cost of queuing, copying, and FD handling.
  • The kernel community’s preference in such cases is to prioritize correctness and safety; the sendpage refactor that landed later (and which obviates the issue) is evidence the maintainers were already working toward cleaner sendpath semantics that avoid this class of mistake.
  • Backporting a kernel change requires testing for deadlocks and ordering issues; maintainers who backport must ensure the introduced lock plays well with other socket and garbage collection locks. The stable-tree patches were reviewed and landed into older trees, implying maintainers judged the risk acceptable once vetted.

The disclosure timeline and metadata (what we can verify)​

  • The bug report and repro were publicly discussed by the reporter and in kernel mailing lists, and patches were added to stable branches by maintainers. The fix was included in stable backports and vendor advisories. The details and commit notes are consistent across multiple public trackers and vendor lists.
  • Some public trackers show slightly different CVE numbers or map the same underlying bug to multiple identifiers across vendor trackers — this is not uncommon with kernel fixes when vendors issue their own advisories or when the record-keeping differs. Administrators should correlate advisory text (function name, unix_stream_sendpage, af_unix) rather than relying solely on CVE numeric equality across every tracker. If you see multiple CVE numbers referencing the same function and description, check the advisory text and patch id.
  • For those checking vendor pages: some vendor web UIs require JavaScript or session artifacts to render advisory content; if an advisory page appears missing or shows "page not found," try the vendor's plain-text security tracker or use the vendor's package tracker to find the kernel update entry. (A server-rendered advisory page can return a tokenized response if accessed without JS.

Risk analysis and recommendations for WindowsForum readers​

  • Server hosts and multi-user systems: High priority. Systems with untrusted local users or container platforms where unprivileged workloads may interact with privileged daemons over AF_UNIX should be patched quickly. Attackers with local access are the threat model, and many hosted environments expose local attacker surfaces.
  • Desktop systems with single-user models: Medium priority. The attack requires local code execution; if the machine is reasonably locked down and users are trusted, the risk reduces. Still, apply updates as part of routine maintenance.
  • Embedded or appliance kernels: High priority. Many appliances lag kernel updates; if they expose local shells or provide plugin interfaces that can create AF_UNIX endpoints, the vendor must issue a firmware/kernel update or provide other mitigations.
  • Developers of IPC-heavy apps: Review code that relies on FD passing; understand that passing FDs can create lifecycle loops and ensure your application consumes FDs promptly and avoids creating long-lived cycles that complicate kernel cleanup.
Recommended immediate steps
  • Inventory systems and prioritize those with multiple local users, exposed containers, or multi-tenant services.
  • Apply vendor kernel patches as soon as feasible, after staged testing.
  • For services that expose AF_UNIX endpoints (systemd, container runtimes, database sockets), consider additional access controls or transient socket permissions until kernels are updated.

Critical appraisal: strengths of the fix and residual risks​

Strengths
  • The fix is surgical and small: acquire the missing lock when accessing peer queue state. Small changes are easier to audit and reason about.
  • The kernel community backported the fix to stable trees and vendors picked it up in distribution kernels, reducing the window where hosts are vulnerable.
  • The prior refactor of sendpage in newer kernels means the long-term architecture already mitigates this class of race if systems can move to newer kernels.
Residual risks and caveats
  • Any locking change in the kernel must be carefully validated for deadlocks and ordering interactions. Backport patches can be trickier in older code contexts.
  • The real-world exploitability depends on local access; but local vulnerabilities are powerful when chained. Administrators should treat this as a practical escalation vector in hostile multi-tenant environments.
  • Vendor advisory labels and CVSS scores varied; rely on your vendor’s guidance for the exact impact and urgency for your shipping kernel.

Final verdict and guidance​

This fix plugs a concrete use-after-free/null-deref in the AF_UNIX sendpath by enforcing proper locking when manipulating peer receive queues. For security-conscious administrators, the path forward is clear: prioritize kernel updates for hosts where local attackers or multi-tenant workloads exist, test vendor kernels in staging, and apply patches per established change-control processes.
The vulnerability highlights several broader lessons for system and runtime designers: passing file descriptors across sockets is powerful but increases lifecycle complexity; kernel garbage-collection and cleanup paths must be carefully coordinated with hot data paths; and small invariants (take the lock before touching a shared pointer) remain critical for kernel reliability and safety.
Systems operators should treat AF_UNIX-related kernel fixes as important local-scope security updates and deploy vendor-supplied kernel updates or backports promptly while continuing to monitor vendor advisories for any follow-up patches or clarifications.
Source: MSRC Security Update Guide - Microsoft Security Response Center