The Linux kernel’s netfilter subsystem is getting an important corrective update for CVE-2026-23351, a flaw in the nft_set_pipapo set backend that can lead to a use-after-free condition and a local denial of service. The fix is not a simple bounds check or a small cleanup; it restructures garbage collection so that expired elements are unlinked first and only reclaimed after the live and clone data structures have safely swapped. That distinction matters because the vulnerable code can spend a long time in a non-preemptible path when many elements have expired, which can trip soft lockup warnings and RCU stall reports. (spinics.net)
At a high level, this CVE is another reminder that kernel memory lifetime bugs are often less about a single bad pointer and more about timing, concurrency, and what readers can still see. In nftables, the pipapo backend is designed for expressive set matching, including arbitrary concatenation of ranges, which makes it powerful but also structurally complex. The more sophisticated the data path, the more careful the code must be about when data becomes unreachable, when it can still be observed, and when it is finally safe to free. (spinics.net)
The bug was reported by Yiming Qian and fixed upstream in commit 9df95785d3d8302f7c066050117b04cd3c2048c2, then carried into stable review as part of the 6.12 series. The patch notes make clear that the old model tried to do too much inside commit-time garbage collection, mixing unlinking and freeing in a way that could leave readers exposed to freed memory. The corrected design explicitly separates those phases so the kernel can preserve the integrity of the live copy of the set while still eventually reclaiming memory. (spinics.net)
This matters beyond the immediate CVE because it reflects a pattern the kernel community has been converging on in netfilter: do not free objects merely because they are logically dead; free them only after every plausible reader has moved past them. A similar lesson was already learned in the rbtree backend, which is why the new fix cites the earlier “don’t gc elements on insert” approach as the closest precedent. In other words, this is not just a patch for one backend; it is part of a broader hardening story for nftables set implementations. (spinics.net)
The pipapo backend exists to support richer matching semantics than a basic hash set or a simple tree. That flexibility is useful for enterprise firewalls, routers, and appliances that need compact rule representation and fast lookup across many fields. But that same flexibility increases the complexity of garbage collection, especially when expired elements are embedded in structures that are concurrently read without heavyweight locking. (spinics.net)
The upstream patch description explains the core safety issue plainly: expired elements remain visible through the live copy of the data structure until the clone/live pointers have been swapped. If freeing happens too early, a reader can still encounter an element that has already been handed to the allocator. That is why the fix insists on unlinking elements first and deferring reclamation until after the structure has advanced far enough that readers no longer have a path to the stale copy. (spinics.net)
This is also why call_rcu() alone is not sufficient in this case. RCU protects against some classes of concurrent access, but not against every timing window created by a design that still exposes old state to lookups or dumps after the callback has been queued. The patch notes emphasize that dump operations or lookups starting after
Historically, this kind of issue tends to show up in data structures that combine lockless readers, deferred mutation, and transactional updates. The netfilter maintainers have been gradually tightening those edges, and this CVE fits that trajectory. It is also notable that the stable patch set arrived with minimal drama: the fix is conceptually straightforward once the race is understood, but the reasoning behind it is deeply concurrency-specific and easy to get wrong in a first attempt. (spinics.net)
The key phrase in the fix is that GC must be split into an unlink phase and a reclaim phase. Unlinking removes the dead items from active reachability; reclamation frees memory only after the transactional machinery has moved the live data structure to a safe state. This is a classic kernel pattern, but it is especially important here because the same structure is observed by the packet path and by userspace dumpers. (spinics.net)
The patch therefore changes the GC routine into pipapo_gc_scan(), which first identifies expired items and links them onto a GC list. Only later, after the relevant pointer swaps have happened, are those queued objects actually reclaimed. The
That makes this CVE particularly relevant to environments where nftables is used heavily for dynamic policy enforcement. Cloud hosts, container platforms, and appliance-style Linux systems often generate many rule updates over time, and those updates may interact with set expiration in ways that are hard to predict. The issue is therefore not just theoretical kernel hygiene; it is an operational reliability problem with a security label attached. (spinics.net)
Enterprises should also care because the bug is tied to complexity in rule management, not just packet volume. A network team that automates frequent updates, churns sets, or deploys ephemeral policy objects is more likely to expose the timing conditions that make this issue visible. In practice, that means modern, automated environments are often more exposed than static ones, even if they are also better monitored. (spinics.net)
Small businesses are in an awkward middle ground. They often run Linux-based gateway devices or edge servers without the deep kernel maintenance processes of a large enterprise, but they also depend on those devices for uptime. A local DoS in the firewall path can easily translate into users losing internet access, VPN connectivity, or internal segmentation controls. (spinics.net)
The patch explains that readers or dumpers that start after
The patch also explicitly references the rbtree backend fix, which is a good sign from a maintenance standpoint. It shows the maintainers are reusing a proven conceptual pattern rather than inventing a one-off workaround. In security engineering, that often improves review quality because reviewers can compare implementations against a known-good model. (spinics.net)
Those details matter because kernel fixes often fail when they patch only the symptom. Here, the code is being shaped to support a different lifecycle, not just a different function call. That is a sign the maintainer understood the root cause as a state-transition bug rather than as a simple memory free timing issue. (spinics.net)
It will also be worth watching whether additional nftables backends receive similar lifetime hardening. The rbtree comparison suggests the maintainers are already thinking in terms of shared design patterns, which is encouraging. If more code paths are audited with the same lens, the payoff could be fewer latent use-after-free conditions and fewer long-running stalls in transaction-heavy environments. (spinics.net)
Source: NVD / Linux Kernel Security Update Guide - Microsoft Security Response Center
Overview
At a high level, this CVE is another reminder that kernel memory lifetime bugs are often less about a single bad pointer and more about timing, concurrency, and what readers can still see. In nftables, the pipapo backend is designed for expressive set matching, including arbitrary concatenation of ranges, which makes it powerful but also structurally complex. The more sophisticated the data path, the more careful the code must be about when data becomes unreachable, when it can still be observed, and when it is finally safe to free. (spinics.net)The bug was reported by Yiming Qian and fixed upstream in commit 9df95785d3d8302f7c066050117b04cd3c2048c2, then carried into stable review as part of the 6.12 series. The patch notes make clear that the old model tried to do too much inside commit-time garbage collection, mixing unlinking and freeing in a way that could leave readers exposed to freed memory. The corrected design explicitly separates those phases so the kernel can preserve the integrity of the live copy of the set while still eventually reclaiming memory. (spinics.net)
This matters beyond the immediate CVE because it reflects a pattern the kernel community has been converging on in netfilter: do not free objects merely because they are logically dead; free them only after every plausible reader has moved past them. A similar lesson was already learned in the rbtree backend, which is why the new fix cites the earlier “don’t gc elements on insert” approach as the closest precedent. In other words, this is not just a patch for one backend; it is part of a broader hardening story for nftables set implementations. (spinics.net)
Background
The nf_tables framework is Linux’s modern packet filtering and classification engine, and it has replaced a lot of the legacy iptables-era logic in contemporary deployments. Within nftables, “sets” are used to represent collections of keys or ranges that packets can be matched against efficiently. That design is central to performance, but it also means the kernel is juggling mutable structures that are visible to both packet-processing code and userspace dump operations. (spinics.net)The pipapo backend exists to support richer matching semantics than a basic hash set or a simple tree. That flexibility is useful for enterprise firewalls, routers, and appliances that need compact rule representation and fast lookup across many fields. But that same flexibility increases the complexity of garbage collection, especially when expired elements are embedded in structures that are concurrently read without heavyweight locking. (spinics.net)
The upstream patch description explains the core safety issue plainly: expired elements remain visible through the live copy of the data structure until the clone/live pointers have been swapped. If freeing happens too early, a reader can still encounter an element that has already been handed to the allocator. That is why the fix insists on unlinking elements first and deferring reclamation until after the structure has advanced far enough that readers no longer have a path to the stale copy. (spinics.net)
This is also why call_rcu() alone is not sufficient in this case. RCU protects against some classes of concurrent access, but not against every timing window created by a design that still exposes old state to lookups or dumps after the callback has been queued. The patch notes emphasize that dump operations or lookups starting after
call_rcu() can still observe the freed element unless the commit phase has already made meaningful progress in swapping the relevant pointers. That nuance is exactly why kernel memory lifetime bugs are so tricky: the free itself is not the whole story. (spinics.net)Historically, this kind of issue tends to show up in data structures that combine lockless readers, deferred mutation, and transactional updates. The netfilter maintainers have been gradually tightening those edges, and this CVE fits that trajectory. It is also notable that the stable patch set arrived with minimal drama: the fix is conceptually straightforward once the race is understood, but the reasoning behind it is deeply concurrency-specific and easy to get wrong in a first attempt. (spinics.net)
What the Vulnerability Actually Is
At its core, CVE-2026-23351 is a use-after-free in the pipapo set type. The vulnerable behavior emerges when a large number of expired elements forces garbage collection to do too much work in a single commit path, especially in a non-preemptible context. That can create both a reliability problem and a memory-safety problem: the system can stall, and the timing window can allow readers to touch objects that are no longer valid. (spinics.net)The key phrase in the fix is that GC must be split into an unlink phase and a reclaim phase. Unlinking removes the dead items from active reachability; reclamation frees memory only after the transactional machinery has moved the live data structure to a safe state. This is a classic kernel pattern, but it is especially important here because the same structure is observed by the packet path and by userspace dumpers. (spinics.net)
Why the old approach failed
The original design tried to free expired entries too close to the mutation step. Under light load, that can appear to work fine, which is why bugs like this live for a long time. Under heavy expiration churn, though, the commit phase can run long enough that the system starts reporting soft lockup and RCU stall symptoms, and the lifetime bug becomes much easier to trigger. (spinics.net)- The object is still visible to readers during commit.
- The commit path can run for a long time when many entries expire.
- Freeing too early creates a stale-pointer hazard.
- Lockless or quasi-lockless readers amplify the risk.
- A DoS condition can appear even without full privilege escalation. (spinics.net)
Why Garbage Collection Needed a Rewrite
The fix is interesting because it does not merely “move a free later”; it recognizes that there are two different notions of death in this code. One is logical death, where an expired element should no longer participate in matching. The other is physical death, where memory can be returned to the allocator. In concurrent systems, those two moments are almost never the same. (spinics.net)The patch therefore changes the GC routine into pipapo_gc_scan(), which first identifies expired items and links them onto a GC list. Only later, after the relevant pointer swaps have happened, are those queued objects actually reclaimed. The
gc_head list added to the pipapo private data structure is the bookkeeping mechanism that makes the two-step design possible. (spinics.net)The significance of the split
This matters because it makes the code’s intent match the memory model. A reader can safely see an object if the live version still references it; that reader cannot safely see an object once it has been returned to the allocator. The split ensures the kernel never crosses that line until it has first taken away the last live reference path. (spinics.net)- Unlink removes the element from active structures.
- Reclaim returns memory only after state transitions are complete.
- RCU callbacks are necessary but not sufficient on their own.
- Transaction ordering determines whether stale readers can exist.
- Pointer swapping is the real safety boundary, not the callback itself. (spinics.net)
How the Bug Manifests Operationally
From an operator’s perspective, the first symptom may not be a crash at all. The kernel can become sluggish while handling nftables transactions, especially if a rule set has accumulated a large volume of expired entries. That can produce soft lockup warnings, RCU stall reports, and visible latency spikes long before anyone notices a memory safety issue. (spinics.net)That makes this CVE particularly relevant to environments where nftables is used heavily for dynamic policy enforcement. Cloud hosts, container platforms, and appliance-style Linux systems often generate many rule updates over time, and those updates may interact with set expiration in ways that are hard to predict. The issue is therefore not just theoretical kernel hygiene; it is an operational reliability problem with a security label attached. (spinics.net)
What administrators might see
The practical warning signs are often indirect.- nftables operations feel slower than usual.
- The kernel logs soft lockup or stall messages.
- Transaction-heavy firewall workloads behave erratically.
- System responsiveness degrades under ruleset churn.
- A local user or service can trigger a denial-of-service condition. (spinics.net)
Enterprise Impact
For enterprise users, this kind of flaw lands in a sensitive part of the stack. Netfilter is not an optional add-on; it is part of the machinery that many organizations rely on for segmentation, policy enforcement, and traffic control. If nftables is part of a gateway, router, Kubernetes node, or security appliance, a reliability bug in set garbage collection can become an availability incident. (spinics.net)Enterprises should also care because the bug is tied to complexity in rule management, not just packet volume. A network team that automates frequent updates, churns sets, or deploys ephemeral policy objects is more likely to expose the timing conditions that make this issue visible. In practice, that means modern, automated environments are often more exposed than static ones, even if they are also better monitored. (spinics.net)
Why this is not just a “Linux kernel issue”
The kernel itself is the affected component, but the blast radius belongs to the systems built on top of it. Gateways, edge nodes, VPN concentrators, and container hosts often use nftables in the hot path. If one of those systems stalls, the impact can range from dropped connections to complete service interruption. (spinics.net)- Security appliances may freeze under sustained ruleset churn.
- Container hosts may see service disruption from control-plane stalls.
- Gateway nodes can lose packet-processing responsiveness.
- Monitoring systems may mistake a stall for general resource exhaustion.
- Recovery may require a kernel update, not just a service restart. (spinics.net)
Consumer and Small Business Impact
For home users, the exploitability story is narrower, but the reliability story still matters. A workstation running nftables, a small lab firewall, or a home router based on Linux could, in theory, hit the same pathological path if a sufficiently large and expired set accumulates. The likely symptom would be a slowdown or a stall rather than a dramatic compromise. (spinics.net)Small businesses are in an awkward middle ground. They often run Linux-based gateway devices or edge servers without the deep kernel maintenance processes of a large enterprise, but they also depend on those devices for uptime. A local DoS in the firewall path can easily translate into users losing internet access, VPN connectivity, or internal segmentation controls. (spinics.net)
The practical takeaway for smaller deployments
Smaller operators should think in terms of stability hygiene rather than exploit drama.- Keep kernel packages current.
- Watch for nftables-related lockup logs.
- Avoid unnecessary churn in ruleset automation.
- Test updates on appliances before rolling them broadly.
- Treat local kernel DoS bugs as availability risks, not just security footnotes. (spinics.net)
Why the RCU Argument Matters
One of the most technically important parts of the patch notes is the explicit rejection of the idea that call_rcu() by itself solves the problem. RCU is a powerful tool, but it only guarantees safety relative to the grace period it governs; it does not magically eliminate all concurrency hazards if the structure remains reachable to new readers through some other path. That is the subtlety at the center of this CVE. (spinics.net)The patch explains that readers or dumpers that start after
call_rcu() has been scheduled can still see the freed element if the live structure has not advanced far enough. That means the true control point is not the callback queue; it is the point at which the clone and live pointers are swapped and the old state becomes inaccessible to fresh readers. In kernel terms, that is an important ordering guarantee, not merely a cleanup step. (spinics.net)This is a design lesson, not just a bug fix
The code is teaching an old lesson in a new place: deferred reclamation only works when the last reader boundary is well defined. If the code can still expose the old object to later lookups, then the callback timing is irrelevant. That is why the patched logic carefully separates “remove from view” from “free memory.” (spinics.net)- RCU protects against some stale readers, but not all visibility windows.
- New lookups can arrive after the callback is queued.
- Userspace dumpers are part of the reader model.
- Memory safety depends on structure state, not just callback completion.
- Pointer swapping is the decisive boundary for safe reclaim. (spinics.net)
Relationship to Earlier netfilter Fixes
The CVE sits in a broader line of netfilter work that has steadily improved set handling. The stable changelog and netfilter pull request both place this change beside other corrections to set insertion and cloning behavior, which suggests maintainers are tightening multiple edges of the same subsystem at once. That is not unusual for mature kernel code: once a subsystem receives enough real-world use, corner cases cluster around similar design assumptions. (spinics.net)The patch also explicitly references the rbtree backend fix, which is a good sign from a maintenance standpoint. It shows the maintainers are reusing a proven conceptual pattern rather than inventing a one-off workaround. In security engineering, that often improves review quality because reviewers can compare implementations against a known-good model. (spinics.net)
Why prior fixes matter
Past netfilter bugs have often revolved around similar themes: transactional inconsistency, object lifetime, and visibility to lockless readers. The cumulative lesson is that nftables is functionally rich, but that richness comes with shared failure modes across set types. When one backend changes its GC semantics, the rest of the family deserves a second look. (spinics.net)- Shared data-structure patterns often mean shared bug classes.
- Fixes in one backend can illuminate weaknesses in another.
- Transactional semantics are as important as raw pointer safety.
- Stable backports help close the gap between upstream and deployed kernels.
- Cross-backend consistency reduces the chance of regressions. (spinics.net)
Patch Mechanics and Code-Level Changes
The code diff shows several concrete adjustments beyond the headline GC split. A newgc_head list is added to the private pipapo structure, and the GC scan routine is renamed to reflect its narrower mission: identify expired elements and queue them for later reclamation. There is also a small structural cleanup around nft_trans_gc_space, which moves into a more suitable inline helper location. (spinics.net)Those details matter because kernel fixes often fail when they patch only the symptom. Here, the code is being shaped to support a different lifecycle, not just a different function call. That is a sign the maintainer understood the root cause as a state-transition bug rather than as a simple memory free timing issue. (spinics.net)
What to notice in the diff
The visible change is modest in line count, but the semantic effect is large.- Expired elements are no longer freed during the dangerous window.
- A GC list tracks objects awaiting reclaim.
- Memory reclaim is separated from the scan that discovers expiry.
- Reader exposure is considered part of the correctness model.
- The implementation now mirrors an established rbtree pattern. (spinics.net)
Strengths and Opportunities
The good news in this CVE story is that the fix addresses a clearly identified architectural flaw and does so in a way that improves the codebase beyond a single exploit scenario. It turns a risky one-pass cleanup into a safer two-phase lifecycle, and that can strengthen both correctness and maintainability.- Cleaner memory lifetime boundaries make future reviews easier.
- Reduced reader exposure lowers the chance of hidden UAFs.
- Better alignment with RCU semantics improves concurrency reasoning.
- Potentially fewer lockup scenarios under heavy ruleset churn.
- Shared pattern with rbtree backend makes the subsystem more consistent.
- Stable backportability should help vendors ship the fix broadly.
- Operational resilience improves for firewall-heavy deployments. (spinics.net)
Risks and Concerns
Even with a solid upstream fix, there are still real concerns. Kernel security vulnerabilities do not disappear the moment a commit lands; they persist until distributions, vendors, and appliance builders pick up the patch. And because this issue is tied to a core networking path, even a modest lag in deployment can matter for uptime.- Backport delays may leave some products exposed longer.
- Vendor-specific kernels can diverge from upstream timing.
- Firewall appliances may be slow to absorb fixes.
- Operational symptoms can look like generic instability, delaying diagnosis.
- Local DoS impact is serious even without remote code execution.
- Complex transaction paths can hide similar bugs elsewhere.
- Heavy ruleset churn may still stress adjacent code paths. (spinics.net)
Looking Ahead
The most immediate thing to watch is how quickly the fix propagates through stable trees and downstream vendor kernels. The upstream commit already exists, and the stable review process is underway, but real-world exposure will depend on packaging and distribution velocity. For organizations that rely on nftables in production, the update cadence matters as much as the patch itself. (spinics.net)It will also be worth watching whether additional nftables backends receive similar lifetime hardening. The rbtree comparison suggests the maintainers are already thinking in terms of shared design patterns, which is encouraging. If more code paths are audited with the same lens, the payoff could be fewer latent use-after-free conditions and fewer long-running stalls in transaction-heavy environments. (spinics.net)
Watch items
- Upstream and stable kernel release notes for backported fixes.
- Vendor advisories for appliances and enterprise Linux kernels.
- Any follow-on patches that touch nftables set lifetime semantics.
- Reports of soft lockups or RCU stalls tied to ruleset expiration.
- Whether adjacent netfilter backends adopt the same two-phase reclaim pattern. (spinics.net)
Source: NVD / Linux Kernel Security Update Guide - Microsoft Security Response Center
Last edited:
Similar threads
- Article
- Replies
- 0
- Views
- 2
- Replies
- 0
- Views
- 8
- Replies
- 0
- Views
- 35
- Article
- Replies
- 0
- Views
- 3
- Article
- Replies
- 0
- Views
- 4