CVE-2026-23368: Fixing Linux PHY LED Trigger AB-BA Deadlock

  • Thread Author
In the Linux kernel, CVE-2026-23368 is a classic example of how a seemingly small initialization change can remove a hard-to-reproduce system hang. The bug sits in the networking PHY and LED trigger interaction path, where enabling both LEDS_TRIGGER_NETDEV and LED_TRIGGER_PHY could produce an AB-BA deadlock during device bring-up. The fix is straightforward in concept but important in practice: register the PHY LED triggers during probe instead of later in phy_attach, removing an unsafe lock ordering between RTNL and triggers_list_lock. (spinics.net)

Diagram shows Linux kernel networking stack using PHY LED triggers and system boot flow.Background​

The Linux kernel has long relied on carefully ordered locking to keep its networking stack from tripping over itself under concurrent configuration and hotplug events. In this case, the danger came from an interaction between the LED subsystem and PHY lifecycle management, two areas that normally evolve independently but collide when link-state LEDs are exposed as triggers. The vulnerability was marked as resolved upstream in a stable backport thread on March 9, 2026, and NVD recorded the CVE on March 25, 2026. (spinics.net)
The core issue is that LED triggers are not just cosmetic. They are runtime kernel objects with their own registration and activation paths, and those paths often involve locks shared across subsystems. Linux documentation describes triggers as kernel-based sources of LED events, and the PHY-related trigger support exists precisely so network state can be reflected on hardware LEDs without userspace polling or custom glue code.
That convenience comes with a price: locking discipline matters. The stable patch text shows the deadlock cycle very clearly. One path enters led_trigger_register() while holding RTNL, then tries to acquire triggers_list_lock. The other path does the reverse: the netdev LED trigger activation path holds triggers_list_lock and then attempts to take RTNL. That is the textbook AB-BA pattern that can freeze a system if both sides collide at the wrong time. (spinics.net)
What makes this case notable is that the bug was not caused by an exotic exploit primitive or a memory corruption chain. It was a synchronization defect in a hot path that could be triggered by ordinary device-open and LED configuration activity. For administrators, that is often worse than a crashy edge case, because deadlocks feel intermittent, resist reproduction, and can be mistaken for unrelated firmware or driver instability. The kernel patch therefore matters less as a feature change than as a reliability hardening measure. (spinics.net)

What Actually Broke​

The failure scenario depends on two features being enabled at once: LED_TRIGGER_PHY, which adds PHY-based LED triggers, and LEDS_TRIGGER_NETDEV, which lets LEDs respond to network-device events. The patch notes explicitly say the AB-BA deadlock occurs when both are enabled, and the stack traces show one side entering through phy_attach_direct() while the network device is being opened. (spinics.net)
The important detail is that phy_led_triggers_register() does not need the RTNL lock at all. The stable patch explains that the helper does not make network-stack calls requiring RTNL protection, and it does not depend on the PHY being attached to a MAC. It only uses phydev state, which means it can safely move to an earlier lifecycle point. That design observation is the real fix, not merely the code motion itself. (spinics.net)

Why AB-BA Deadlocks Hurt So Much​

AB-BA deadlocks are pernicious because both lock sequences can look perfectly valid in isolation. Each path acquires one lock, performs what seems like legitimate work, and then reaches for a second lock that another code path already holds in the opposite order. The result is a circular wait that neither side can resolve. In kernel code, that often means an entire device open or configuration operation stalls indefinitely. (spinics.net)
In this case, the failure was especially awkward because it involved ordinary user actions. One stack trace shows dev_ioctl() holding RTNL while opening a device; another shows LED trigger activation from sysfs writing holding triggers_list_lock while trying to enter network-registration logic. That means a networking admin toggling an LED trigger and a device bring-up path could, in the wrong timing window, end up freezing each other. (spinics.net)
The practical lesson is familiar to kernel developers but easy to miss in review: lock ordering is part of the API contract. Even when a function appears to “just register some state,” the surrounding lock context can make it dangerous in one call site and safe in another. This CVE is a reminder that subsystem boundaries are logical, not transactional. (spinics.net)
  • The deadlock requires both relevant LED triggers to be enabled.
  • One path takes RTNL first, then triggers_list_lock.
  • The other path takes triggers_list_lock first, then RTNL.
  • The bug can manifest during normal device open or LED trigger operations.
  • The failure mode is a hang, not a memory corruption exploit. (spinics.net)

The Kernel Fix​

The upstream change relocates phy_led_triggers_register() from phy_attach_direct() into phy_probe(), ensuring the trigger registration happens outside the RTNL-sensitive attach flow. The patch also updates error handling and cleanup paths so trigger registration and unregistration are paired properly in probe and remove paths. That makes the lifetime more coherent and keeps the lock ordering away from the network attach path. (spinics.net)
The patch text shows three important structural changes. First, the early call is added during phy_probe(), where the kernel is not holding RTNL. Second, the corresponding unregister call is moved to phy_remove(). Third, on probe failure the code now unwinds the trigger registration before resetting the PHY. In other words, the fix is not a narrow lock tweak; it is a small lifecycle refactor. (spinics.net)

Why Probe Is Safer Than Attach​

phy_probe() is the natural place for setup that belongs to the PHY device itself, because the PHY exists before it is attached to a MAC and before the networking path starts manipulating interface state. That matters here because the patch notes explicitly state there is no requirement for the PHY to be attached to a MAC. The triggers only depend on phydev state, so registering them in probe preserves functionality while avoiding the lock trap. (spinics.net)
This is also a good example of moving work earlier only when the dependencies allow it. Kernel maintainers often prefer to shrink critical sections rather than merely reshuffling code, and that is exactly what happened here. By pulling registration out of the RTNL-controlled attach flow, the patch reduces coupling between LED trigger setup and network-device state transitions. (spinics.net)
The stable thread shows the fix was attributed upstream to commit c8dbdc6e380e7e96a51706db3e4b7870d8a9402d and then backported into stable trees. That provenance matters because it demonstrates the issue was recognized as a kernel-wide locking problem rather than a vendor-specific regression. It also signals that downstream distributions should be watching for the backport in their maintenance streams. (spinics.net)
  • Registration moved from attach to probe.
  • Cleanup moved from detach to remove.
  • Error paths were adjusted to unregister triggers on failure.
  • The fix preserves existing guards for SFP and genphy cases.
  • The change is designed to keep RTNL out of the LED trigger registration path. (spinics.net)

Why This Matters for Enterprises​

For enterprise Linux users, the headline is not “someone found a deadlock in LEDs.” The real issue is that network bring-up can stall under a condition that may be encountered during routine interface activation, especially in systems where PHY LEDs are exposed and LED trigger features are enabled. On appliances, switches, embedded controllers, and telecom gear, a hang during device initialization is operationally expensive because it can appear as a failed NIC, failed boot, or intermittent service outage. (spinics.net)
The impact surface is larger than a consumer desktop would suggest. PHY LEDs are especially relevant in managed networking equipment, industrial boards, and embedded Linux systems where the kernel owns both the link layer and the indicator logic. In those environments, trigger-based LEDs are often not merely decorative; they are part of the operator workflow for diagnostics, link status, and field troubleshooting.

Operational Consequences​

When a deadlock occurs in a critical path, the effect can propagate beyond the immediate call stack. Administrators may see devices failing to come up, watchdog resets, or user-visible timeouts while the kernel waits forever on a lock dependency that can never resolve. That means the bug can look like a hardware fault, a bad driver, or an unrelated race elsewhere in the stack. (spinics.net)
Enterprises should also note the subtlety of the trigger combination. The bug requires both LEDS_TRIGGER_NETDEV and LED_TRIGGER_PHY to be enabled, which means some distributions may never expose it in default desktop builds but may enable it in specialized appliances. That makes inventory and kernel configuration review especially important for OEMs and integrators shipping bespoke images. (spinics.net)
A further point of concern is that this kind of issue tends to surface only when systems are under real operational stress, such as boot storms, interface flaps, or remote management scripts toggling interfaces. That is why the fix is best understood as a stability patch with security packaging, rather than a remote exploitation story. In fleet terms, stability bugs that can be externally triggered still deserve the same seriousness as many traditional CVEs. (spinics.net)
  • Appliances with custom kernel configs are the likeliest exposure point.
  • Device-open operations can be delayed or stalled by the deadlock.
  • Troubleshooting can be misleading because the symptom is a hang, not a crash.
  • Configuration combinations matter more than generic kernel version number alone.
  • Backported stable updates are likely the primary remediation channel. (spinics.net)

Consumer Impact Is Narrower, But Not Zero​

Most consumer PCs will never notice this CVE. Many desktop and laptop builds do not enable the PHY LED trigger path, and fewer still expose the specific network-LED configuration patterns that create the deadlock window. That said, consumer impact is not irrelevant, especially for users running home lab hardware, routers, NAS devices, SBCs, or DIY network appliances on mainline or distro kernels.
Home lab users are often the first to hit kernel corner cases because they run unusual combinations of drivers, out-of-tree hardware, and feature-rich kernels. A board that uses PHY LEDs for link indication or a router firmware build that includes both trigger families could freeze during interface activation in a way that feels random and hard to reproduce. Small bugs like this often show up only after the kernel has been customized enough to become interesting. (spinics.net)

What End Users Would Notice​

The most likely symptom is not a visible security alert but a stalled network interface bring-up. A link may fail to come up, a management UI may time out, or the whole system may appear sluggish while kernel threads wait on each other. Because the hang sits in initialization and trigger activation, the visible fallout can look like a firmware issue or a bad cable rather than a locking bug. (spinics.net)
For consumer device makers, the lesson is that trigger-heavy kernels should be tested under the exact feature combinations they ship. If a vendor uses PHY LEDs for status display, it should be validating not only functional blinking behavior but also lock-order behavior during boot, hotplug, interface toggles, and LED sysfs writes. Reliability testing in this space needs to include concurrency, not just happy-path bring-up. (spinics.net)
  • Home routers and NAS boxes are more likely than generic PCs to be affected.
  • The issue may present as a hang during interface activation.
  • LED behavior can be a clue that the PHY trigger path is involved.
  • Kernel configs bundled by vendors are more important than the brand name on the box.
  • Consumer risk rises when custom hardware indicators are integrated into the boot path. (spinics.net)

The Broader Pattern: Locking Bugs Still Dominate Kernel Risk​

Kernel CVEs often look dramatic when they involve memory corruption, privilege escalation, or remote code execution. Yet a large share of hard production problems still come from concurrency and ordering mistakes, because the kernel is fundamentally a shared-state machine. This case fits that pattern perfectly: the bug was not about unsafe data, but about two valid code paths meeting in the wrong order. (spinics.net)
That matters for how operators should think about risk. A deadlock does not always grant an attacker code execution, but it can still create denial of service, especially if the trigger can be reached remotely through normal device management or from a local admin action. In embedded and appliance environments, a denial of service on the network stack can be just as damaging as a smaller privilege bug elsewhere. (spinics.net)

The Security Angle​

The NVD entry for CVE-2026-23368 was still awaiting enrichment at publication time, and no CVSS score had been assigned yet. That absence does not mean the issue is trivial; it means the scoring process had not been completed when the record was published. For security teams, the safer assumption is to treat kernel deadlocks in active code paths as important operational vulnerabilities even before formal scoring lands.
The fix also highlights an increasingly common kernel-maintenance pattern: issues are found by lockdep-style reasoning, field reports, and subsystem maintainers coordinating across trees. The patch references a report by Shiji Yang and was marked for stable backporting, which suggests the kernel community viewed the problem as both reproducible and broadly relevant. That is exactly the kind of bug that can linger if downstreams do not track stable feeds closely. (spinics.net)
One practical takeaway is that “security” and “stability” are not cleanly separable in kernel land. A deadlock in the network path can be a security-relevant denial of service, a reliability issue, and a support nightmare all at once. In the real world, those categories collapse into the same incident ticket. (spinics.net)
  • Kernel deadlocks can create real denial-of-service exposure.
  • Operational severity may exceed the eventual formal CVSS label.
  • Stable backports are as important as upstream fixes.
  • Community reports often surface concurrency bugs before automated tools do.
  • Security and stability overlap heavily in low-level infrastructure code. (spinics.net)

What Administrators Should Do​

The first step is simple: determine whether your kernel build includes the relevant LED trigger options and whether your hardware actually uses PHY-driven LED behavior. Because the bug is configuration-sensitive, broad version checks are not enough; you need to know whether LEDS_TRIGGER_NETDEV and LED_TRIGGER_PHY are enabled in your deployed image. In fleet terms, this is a kernel config audit, not just a package-version audit. (spinics.net)
The second step is to apply the stable update once your distribution ships it. The patch was already being backported into stable branches, and the NVD record points back to kernel.org references, which is a strong sign that downstream vendors should eventually carry the fix. For managed Linux estates, the correct response is usually to schedule the update with the same priority you would give to a hard reboot hang or a networking regression. (spinics.net)

Practical Triage Checklist​

Before rollout, operators should verify whether any of their devices depend on PHY-linked LED behavior for status or control. If they do, test the updated kernel under realistic interface bring-up and LED trigger activity, including user-space writes to the LED trigger sysfs interface. A fix that compiles cleanly but fails under real-time activation patterns does not help much in production. (spinics.net)
Where possible, keep an eye on vendor advisories and stable-tree backports, especially for appliance platforms that run custom kernels. Those platforms often lag upstream by weeks or months, and the gap is where bugs like this become service-affecting. The kernel may be fixed upstream long before your device image is. (spinics.net)
  • Audit kernel configs for LED trigger support.
  • Confirm whether your hardware uses PHY LEDs in production.
  • Prioritize vendor backports and stable updates.
  • Test interface bring-up after patching.
  • Watch for hung opens, timeouts, or LED sysfs interactions during validation. (spinics.net)

Strengths and Opportunities​

This bug also reveals some strengths in the Linux development process. The subsystem maintainers identified a real lock-order problem, the report was reproducible enough to include stack traces, and the eventual fix was small, targeted, and easy to reason about. That is exactly how mature kernel maintenance is supposed to work. The ecosystem caught the issue before it became endemic, and the cleanup stayed narrowly scoped to the lifecycle point that needed correction. (spinics.net)
More broadly, the patch is a reminder that kernel hardening often comes from subtracting unnecessary coupling, not adding more defensive code. By moving registration out of the RTNL path, the fix reduces the chance that future LED or PHY enhancements will stumble into the same lock-order trap. That creates room for future development without reintroducing the same deadlock pattern. (spinics.net)
  • Clear upstream-to-stable provenance improves confidence in the fix.
  • The patch is narrow enough to backport with low risk.
  • Lock ordering is improved by design rather than by workaround.
  • The lifecycle change reduces subsystem coupling.
  • The issue was reported with useful trace evidence, accelerating diagnosis.
  • The fix preserves existing special-case guards rather than flattening behavior. (spinics.net)

Risks and Concerns​

The main concern is that deadlock bugs are often under-tested outside of targeted reproducers. Because this issue depends on the combination of trigger options and the timing of concurrent paths, it may remain invisible in standard QA pipelines. That makes it easy for embedded vendors to ship affected kernels without realizing they are carrying a real operational hazard. (spinics.net)
Another concern is the long tail of backports. Stable patches land quickly in some trees but slowly in downstream vendor kernels, especially for devices that sit in extended-support programs. If your platform vendor is behind, the issue can persist even after the upstream fix is widely known. That lag is where many “fixed” kernel bugs stay alive in practice. (spinics.net)

Hidden Exposure Points​

The patch notes make clear that the bug is absent from the normal MAC-attachment assumption; the trigger registration does not require a MAC at all. That means developers who assumed the RTNL path was harmless because the network stack would “own” the sequence may have misjudged the real dependency graph. Hidden coupling like that is exactly why low-level bugs can survive code review. (spinics.net)
The other concern is observability. A deadlock does not generate the forensic richness of a crash dump unless the system is configured to capture it, so many affected deployments may just report “network came up slowly” or “box hung during boot.” Without detailed logging or lock debugging enabled, administrators may never connect the symptom to PHY LED triggers. (spinics.net)
  • Concurrency bugs may evade ordinary functional testing.
  • Vendor lag can leave fixed issues exposed for months.
  • Symptoms are easy to misdiagnose as firmware or hardware problems.
  • Lack of crash signatures makes incident triage harder.
  • Feature combinations matter more than single-package version checks.
  • Hidden lock dependencies can survive review unless they are explicitly modeled. (spinics.net)

Looking Ahead​

The immediate watch item is straightforward: downstream stable branches and distribution kernels should carry the backport, and operators should confirm the fix has reached their production images. Because the original report was tied to stable-tree maintenance, it is likely to show up in multiple vendor releases over time, but not all at once. The safest posture is to validate your own build, not assume the vendor’s default advisory cadence matches your deployment schedule. (spinics.net)
Longer term, this CVE may encourage maintainers to keep moving device-local initialization out of paths that still depend on broad global locks like RTNL. That trend has been visible across kernel subsystems for years: reduce global serialization where possible, register objects as early as they can stand alone, and avoid letting optional features pull core paths into lock contention. Less shared state in hot paths is often the best security fix available. (spinics.net)

Things to Watch​

  • Which stable branches receive the backport first.
  • Whether vendor appliance kernels lag behind upstream stable.
  • Whether similar LED-trigger registration patterns exist elsewhere.
  • If additional lock-order fixes follow in PHY or LED subsystems.
  • Whether NVD later assigns a formal CVSS score and severity. (spinics.net)
The bigger lesson is that Linux kernel security continues to be shaped by correctness work that looks mundane on the surface. Deadlocks, lifetime mismatches, and lock-order inversions rarely make splashy headlines, but they have outsized consequences in infrastructure that must boot, route, blink, and recover reliably. CVE-2026-23368 is one more reminder that in the kernel, a one-line move at the wrong layer can be the difference between a clean boot and a silent hang.

Source: NVD / Linux Kernel Security Update Guide - Microsoft Security Response Center
 

Back
Top