CVE-2026-23393 Fix: disable delayed work to close a bridge CFM race

ChatGPT · Mar 26, 2026

When Linux kernel developers talk about a “fix” for a race condition, they are often describing more than a simple cleanup: they are closing a timing window that could turn ordinary state management into a use-after-free hazard. That is exactly what happened with CVE-2026-23393, a bridge: cfm bug in peer MEP deletion where deferred work could be re-queued after cancellation but before the object itself was finally freed. The result was a classic concurrency gap in a hot path, one subtle enough to survive review until the right interleaving exposed it. The upstream fix replaces cancel_delayed_work_sync() with disable_delayed_work_sync() in the peer MEP deletion paths, a small-looking change that meaningfully changes the safety contract. Linux kernel’s bridge CFM code sits in a part of the networking stack where object lifetime, softirq execution, and delayed work all meet. That combination is fertile ground for races because the code is not executing in one neat serialized thread of control; it is moving through timers, receive paths, and deferred work queues that can overlap in time. In this case, the vulnerable behavior centered on peer_mep teardown, where the code expected cancellation to be enough before freeing the underlying structure.
The report explainsx()executes in softirq context underrcu_read_lock

, and crucially without RTNL protection. That means it can still observe the peer MEP in the hash/list structure and decide to reschedule

ccm_rx_dwork` while deletion is already in progress. In other words, the receive path and the delete path were not synchronized tightly enough to prevent a narrow but dangerous race.
The kernel community has long treated thsly because deferred work is not just “later execution”; it is another live execution context that can act on stale pointers if teardown does not fully fence it off. The distinction between canceling a work item and disabling it matters precisely because cancellation only drains the current instance, while disabling changes whether future queueing attempts are accepted. That is the difference between quiescing a worker and making it impossible to come back.
That nuance also explains why the issue is security-relevant eads like a correctness defect at first glance. A race that can send delayed work back to a freed peer_mep object is the sort of lifecycle error that can collapse into memory corruption, instability, or kernel crash behavior depending on timing and architecture. The CVE record itself does not yet publish a vendor severity score, but the upstream analysis is clear enough to justify attention.
It is also worth noting how the Linux CVE process frames bugs like this. The kerneCVE assignment is intentionally broad for fixes that are judged security-relevant during stable-tree processing, because even issues that are not obviously exploitable at first can still matter later in real deployments. That policy explains why a race-condition patch in a niche networking path can still become a formal CVE.

What CVE-2026-23393 Actually Changes

At the center of the fix is a deceptively small API swap: cancel_delayed_work_sync() becomes disable_delayed_work_sync() in the two peer MEP deletion paths. The change is not cosmetic. It alters what happens if br_cfm_frame_rx() tries to re-arm ccm_rx_dwork after the delete path has begun but before the object’s RCU grace period has fully elapsed. The old behavior allowed the queueing attempt to race with final teardown; the new behavior rejects that requeue attempt outright.

Why the old pattern failed

cancel_deloften good enough when the intent is simply to stop the currently scheduled work item and wait for it to finish. But it does not, by itself, permanently forbid later queueing from another execution context that still sees the object. That is exactly the window the report describes: the work was canceled, the peer MEP still lived in the hlist, and softirq receive processing could still call

ccm_rx_timer_start()and enqueue the work again beforekfree_rcu()` made destruction final.
This is the kind of race that feels tiny in prose and enormous irker runs on a freed peer_mep, it is not merely “late”; it is acting on memory that no longer belongs to it. Even when the immediate symptom is not a crash, the code has lost its lifetime invariants, and kernel code does not get many chances to violate those safely.

Why disabling is stronger

disable_delayed_work_sync() is the more appropriate e it shuts the door on later requeueing attempts. The CVE description is explicit that subsequent queue_delayed_work() calls from br_cfm_frame_rx() are silently rejected after disablement. That is the desired property during teardown: once deletion begins, nothing else in the receive path should be able to revive the timer.
The fix is elegant because it solves the bug where it lives rather than layering on heavier locking. In ke that is often the preferred move. Adding more locks can create contention, more complexity, and new deadlock risks; changing the lifecycle primitive can preserve performance while tightening invariants.
Key implications of the API swap:

It preserves the deletion path’s ability to wait for in-flight work.
It blocks later rthe softirq receive path.
It reduces the risk of a freed-object callback.
It keeps the fix local to peer MEP teardown.
It avoids broad changes to the surrounding bridge CFM logic.

The Race Window, Step by Step

The vulnerability description is unusually helpful because it lays out the race as a two-CPU interleaving. Thkernel bugs are hard to visualize, but this one can be understood as a precise gap between “work canceled” and “object actually gone.” In that gap, the receive path can still see enough state to re-arm the worker.

The sequence that mattered

On CPU0, the delete implementation calls cancel_delayed_work_sync(ccm_rx_dwork). On CPU1, br_cfm_frame_rx() runs while the peer_mepthe hlist and sees a defect state that leads to ccm_rx_timer_start(). That helper queues the delayed work again. Then deletion continues: the peer is removed from the list and freed via kfree_rcu(). Eventually, the re-queued work fires and runs against a structure that is no longer valid.
That pattern is especially treacherous because both paths are individually reasonable. The delete path wants to stop the work, and the receive path wants to maintain CFM state by scheduling he bug is not that either side was absurd; it is that the teardown contract was too weak to survive concurrency.

Why softirq context makes it harder

Softirq context is an important detail because it means the receive side does not need the same locks or process-context serialization that many developers instinctively under RCU read-side protection, which is great for scalability but does not automatically solve lifecycle races. In practical terms, RCU lets readers proceed quickly, but it also demands that writers be very disciplined about what can still happen before grace periods end.
This is one of the recurring lessons in networking security: a pointer can be logically “on the way out” while still being physically reachable from a reader that is perfectly legal in RCU terms. Unless the teardown path disables fuader can still trigger deferred work and extend the lifetime of the wrong object. That is the gap CVE-2026-23393 closes.

CPU0 begins teardown and cancels existing delayed work.
CPU1 still sees peer_mep in the live data structure.
CPU1 re-queues ccm_rx_dwork because defect handling still thinks the peer is active.
CPU0 continues teardown, unlinks the peerCU.
The delayed work later runs against freed memory.

Why This Is a Real Security Bug

Not every race condition becomes a security issue, but this one crosses the line because it threatens object lifetime safety. A delayed work callback that executes after the backing structure has been freed is the kind of condition ty corruption, kernel warnings, crashes, or more subtle integrity problems depending on what the callback touches. The upstream description is careful, but the risk profile is familiar to anyone who follows kernel memory-safety fixes.

Security impact versus exploitability

It would be premature to overstate exploitability from the available record alone. The CVE text does not describe a direct code-execution path, and that restraint is appropriate. What it does describe is a race that can cause a worker to run on freed s to merit patching even if downstream exploitation characteristics remain unclear.
That distinction matters for enterprise triage. Security teams often need to separate high-confidence exploit primitives from kernel robustness flaws that are still worth fixing because they can destabilize systems or become exploitable when combined with other weaknesses. CVE-2026-23393 belongs in that second r evidence says otherwise.

Why the CVE was assigned anyway

The Linux kernel CVE process is deliberately cautious. The documentation says the assignment team tends to give CVEs to fixes that are security-relevant even when exploitation is not obvious at disclosure time. That policy is sensible in practice because lifecycle bugs often look minor until someoneeal failure mode in a production environment.
In the bridge CFM case, the security relevance comes from the combination of deferred work, RCU teardown, and a requeue path that remains reachable for a short but dangerous interval. That is enough to justify tracking, even if the ultimate severity ends up being modest rather than critical.

Why vendor teams should still care

Even if a bug is not remotely exploitable in the classic sense, kernel lifecycle problems can still generate expensive operational incidents. A freed-object work callback can cause unreproducible crashes, bizarre network instability, or intermittent corruptied to the original patch. That is the kind of issue that turns into a support escalator and a patch-cycle headache.

Bridge CFM and the Networking Stack

Bridge CFM sits in a specialized corner of the kernel networking stack, but niche does not mean harmless. Specialized protocol code often has fewer real-world users, yet it also tends to have more intricate state transitions because it is serving a very specific timing and combination makes race conditions more likely to hide in plain sight.

The role of deferred work

Deferred work is a common kernel pattern because it lets the networking stack schedule follow-up actions without doing everything in the latency-sensitive receive path. That design is efficient, but it also means developers have to reason carefully about what happens when the object that owns the work is dele still arriving. The race here is a textbook example of why timers and object lifetimes must be designed together.
The receive path br_cfm_frame_rx() is particularly important because it can still decide that a CCM-related timer should be restarted if it sees a defect state. That means the receive path is not passive; it actively participates in work scheduling and therefore can resurrect a timer unless the teardown path explicitly prevents that.

Why RCU is not thakes readers fast, but it does not guarantee that a reader will not trigger additional logic. That is a subtle but critical point. A reader can legally observe a structure during its RCU-protected lifetime and still do something harmful if writers have only partially shut down its side effects.

This is why “free after RCU grace period” afe from all race conditions.” If the object still has live timers or delayed work that can be re-armed, then the teardown path needs to disable those scheduling avenues first. CVE-2026-23393 is essentially a reminder of that rule.

Operational takeaways for networking teams

Timers must be treated as part of object lifetime, not ang.
Softirq readers can still extend work lifetimes if the code allows requeueing.
RCU protects memory reclamation, not every side effect attached to an object.
Teardown code should prevent new work, not merely wait for old work to finish.
Subsystem-specific helper choice can be mding generic locks.

The Fix in Context

The patch is notable because it chooses the narrowest practical intervention. Instead of reworking the bridge CFM state machine, it changes the behavior of delayed-work teardown in the deletion path only. That is the sort of fix maintainers prefer when the bug is well understood and the risk of collateral damage from a larger refactor would be higher than the bug itself.

Why the `cc_peer_disable()` helper stE text makes an important exception: `cc_peer_disable()` keeps `cancel_delayed_work_sync()` because it participates in the CC enable/disable toggle path, where the delayed work must remain re-schedulable. That tells us the maintainers were not just swapping APIs blindly; they were distinguishing between teardown semantics and temporary disable semantics.

That is a good sign. Security fixes are stronve legitimate lifecycle differences instead of flattening them into one-size-fits-all behavior. Here, disabling a peer during deletion is different from temporarily toggling CC state, and the code now reflects that distinction.

Why API semantics matter in kernel code

Kernel APIs often look interchangeable to outsiders until a race condition exposes the gap in their guarantees. cancel_delayed_work_sync() soundbut it is strong in the sense of draining current activity rather than forbidding future queueing. disable_delayed_work_sync() carries a stronger lifetime meaning, and that is what the delete path needed.
This is one reason kernel security reviews often focus on exact helper usage rather thamic intent. In concurrency-sensitive code, the wrong helper can be almost as dangerous as no helper at all, because it creates the illusion of correctness while leaving a narrow path open.

What this suggests about maintainership

The fix also suggests the issue was identified at the right abstraction layer. Developers did not need to invent new state flags or introduce heavy synchronization. They simply needed as “this work may never come back after deletion begins.” That is concise, maintainable, and easier to backport.

Strengths and Opportunities

This CVE is a good example of the kernel ecosystem catching a subtle lifecycle issue early, and that gives downstream maintainers a straightforward fix is narrow, understandable, and likely low-regression compared with a wider locking redesign. It also reinforces a broader security lesson: not every serious bug looks flashy, and not every important kernel fix maps to a high CVSS score on day one.

The patch is surgical, not disruptive.
The race description is clear enough to reason about.
nger lifecycle primitive** instead of a heavier lock.
The change is likely easy to backport into stable trees.
The affected surface appears narrow and subsystem-specific.
Security teams gain a concrete tracking identifier for remediation.
The case reinforces better timer and workqueue discipline in networking code.

A broader engineering opportunity

This is also a chance for maintainers to audit other peer or timer teant networking code. Whenever a receive path can re-arm work after a delete path has started, the code deserves a second look. The best outcome is not merely fixing one CVE but identifying the same pattern elsewhere before it becomes the next one.

Risks and Concerns

The chief concern is that a race like this can be easy to underestimate because the description sounds like ordinary cleanup. In practice, however, the combination of RCU, softirq execution, and delayed wor concurrency to make freed-memory callbacks plausible if the teardown primitive is too weak. That is exactly the sort of bug that may lie dormant until production traffic hits the wrong timing window.

The bug may be hard to reproduce outside targeted testing.
The affected path is specialized, so exposure may be uneven.
Downstream kernels may carry **differe
Administrators may misread it as a mere stability issue.
A freed-object callback can cause crashes or worse if triggered.
Vendor advisories may lag or differ in build numbering.
The lack of a published NVD score at first can slow prioritization.

Why patch verification matters

Kernel CVEs are notoriously easy to mis-handle when teams rely on version strings alone. The Linux kernel documentation emphasizes that stabl mainline commits, are what matter for supported systems. That means operators need to verify the exact build they are running, not simply assume a version number implies protection.

Why the semantics can be misread

Another risk is that readers may see “bridge: cfm” and assume the issue only affects obscure lab setups. That would be a mistake. The fact that a bug lives ystem does not make it harmless; it often just means fewer people have spent time shaking the code hard enough to expose its timing windows.

What to Watch Next

The most important near-term question is how quickly the fix propagates through stable and vendor-maintained kernel lines. In the Linux world, the upstream fix is only the beginning; real-world exposure changes when distribution kernels and appliance builds ingest the patch. The other thing to watch is whether similar workqueue-teardown patterns appear in nearby bridge or networking helpers, because race bugs often travel in families.

The immediate checkpoints

Watch for stable backports restream fix.
Verify whether enterprise vendors have issued their own advisory mappings.
Check whether kernel trees used in appliances or containers have the patch.
Look for related lifetime fixes in bridge CFM and neighboring networking paths.
Reassess whether any automation still treats this as low-priority by default.

What enterprise teams should do

Security teams should treat this as a standard kernel update item, not a theoretical edge case. If your fleet uses Linux kernels with bridge CFM support, especially in infrastructure or network-oriented deployments, it is worth confirming whether the fix is already in your vendor’s stable branch. In practical terms, that is the difference between an item closed on paper and a race condition still live on a production host.

What kernel developers should learn from it

The lesson for developers is simple but important: teardown paths must prevent resurrection. If a receive path or timer callback can re-queue work after deletion starts, then the code has not really ended the object’s life. Disabling the work source is often the cleanest and safest answer.
The broader significance of CVE-2026-23393 is not that bridge CFM is uniquely dangerous, but that it shows how easily a well-intentioned cancellation pattern can fail when concd across softirq, RCU, and deferred work. The fix restores a stronger guarantee, and that is what good kernel security often looks like: not dramatic, but precise, local, and effective. By the time this patch has propagated through vendor trees, it will likely fade into the background as one more routine update — which, in kernel security, ist the system worked.

Source: NVD / Linux Kernel Security Update Guide - Microsoft Security Response Center

Search

Navigation section

CVE-2026-23393 Fix: disable delayed work to close a bridge CFM race

What CVE-2026-23393 Actually Changes

Why the old pattern failed

Why disabling is stronger

The Race Window, Step by Step

The sequence that mattered

Why softirq context makes it harder

Why This Is a Real Security Bug

Security impact versus exploitability

Why the CVE was assigned anyway

Why vendor teams should still care

Bridge CFM and the Networking Stack

The role of deferred work

Operational takeaways for networking teams

The Fix in Context

Why API semantics matter in kernel code

What this suggests about maintainership

Strengths and Opportunities

A broader engineering opportunity

Risks and Concerns

Why patch verification matters

Why the semantics can be misread

What to Watch Next

The immediate checkpoints

What enterprise teams should do

What kernel developers should learn from it

Similar threads

Navigation section

CVE-2026-23393 Fix: disable delayed work to close a bridge CFM race

Why the old pattern failed​

Why disabling is stronger​

The Race Window, Step by Step​

The sequence that mattered​

Why softirq context makes it harder​

Why This Is a Real Security Bug​

Security impact versus exploitability​

Why the CVE was assigned anyway​

Why vendor teams should still care​

Bridge CFM and the Networking Stack​

The role of deferred work​

Operational takeaways for networking teams​

The Fix in Context​

Why API semantics matter in kernel code​

What this suggests about maintainership​

Strengths and Opportunities​

A broader engineering opportunity​

Risks and Concerns​

Why patch verification matters​

Why the semantics can be misread​

What to Watch Next​

The immediate checkpoints​

What enterprise teams should do​

What kernel developers should learn from it​

Similar threads

Why the old pattern failed

Why disabling is stronger

The Race Window, Step by Step

The sequence that mattered

Why softirq context makes it harder

Why This Is a Real Security Bug

Security impact versus exploitability

Why the CVE was assigned anyway

Why vendor teams should still care

Bridge CFM and the Networking Stack

The role of deferred work

Operational takeaways for networking teams

The Fix in Context

Why API semantics matter in kernel code

What this suggests about maintainership

Strengths and Opportunities

A broader engineering opportunity

Risks and Concerns

Why patch verification matters

Why the semantics can be misread

What to Watch Next

The immediate checkpoints

What enterprise teams should do

What kernel developers should learn from it