CVE-2026-23357: Linux mcp251x Deadlock Lets Kernel Availability Hang

  • Thread Author
A locked “MUTE” button and a flashing “INTERRUPT” gear icon on a digital interface.
CVE-2026-23357 is a Linux kernel vulnerability in the SocketCAN mcp251x driver, a driver used for Microchip MCP251x and MCP25625 SPI-based CAN controllers. The issue is a deadlock in the error-handling path of mcp251x_open(), specifically involving free_irq() being called while the driver’s mcp_lock mutex is still held. Under the right timing conditions, an interrupt can occur before the driver finishes unwinding from a failed open operation. The interrupt handler then waits on the same mutex, while free_irq() waits for the interrupt handler to finish. Neither side can proceed, producing a kernel deadlock.
The vulnerability has been classified by NVD as a Medium-severity issue with a CVSS 3.1 base score of 5.5. The vector is AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H, which means exploitation is local, low complexity, requires low privileges, does not require user interaction, does not cross a security scope boundary, and primarily affects availability. It is also mapped to CWE-667, Improper Locking.
Although this is not a confidentiality or integrity issue, it can matter in embedded, industrial, automotive, laboratory, and hardware-control environments where a local user or service can trigger CAN interface initialization. If the vulnerable path is reached, the result may be a hung process, a stuck network interface bring-up operation, or broader kernel availability problems depending on the platform and workload.
The affected code is in the Linux kernel CAN subsystem, specifically the mcp251x SPI CAN controller driver. This driver supports common external CAN controller chips such as the MCP2510, MCP2515, and MCP25625. These chips are widely used in embedded boards and development setups because they provide CAN bus support over SPI, making them common on single-board computers, industrial gateways, automotive prototyping boards, robotics systems, and custom hardware designs.
The vulnerable function, mcp251x_open(), is part of the driver’s network device open path. In practical terms, it is involved when a CAN network interface backed by an MCP251x device is brought up. A typical user-space action that can eventually reach this code path is enabling a CAN interface, for example with tooling that performs the equivalent of bringing can0 up. The vulnerability is not about sending malicious CAN frames over a network. It is about local interaction with a kernel driver and the timing of an interrupt while the driver is handling a failed open operation.
The core bug is a locking-order problem. In the vulnerable error path, mcp251x_open() may call free_irq() while holding the mcp_lock mutex. That is dangerous because free_irq() is not merely a bookkeeping call. It must ensure that the interrupt handler is no longer running before the IRQ is fully released. If an interrupt has already fired, the interrupt handler may be executing or waiting to execute. In this driver, the interrupt handler may also need mcp_lock. If the open path holds mcp_lock and then calls free_irq(), while the interrupt handler is waiting for mcp_lock, a deadlock is created.
The sequence is relatively simple:
  • A local action starts opening the MCP251x-backed CAN interface.
  • The driver obtains mcp_lock.
  • Something fails in the open path, sending execution into the error-handling path.
  • Before or during cleanup, an interrupt occurs.
  • The interrupt handler attempts to acquire mcp_lock.
  • The open error path, still holding mcp_lock, calls free_irq().
  • free_irq() waits for the interrupt handler to finish.
  • The interrupt handler cannot finish because it is waiting for mcp_lock.
  • The lock holder cannot release progress because it is blocked in free_irq().
That is the deadlock.
The fix is conceptually small but important: move the free_irq() call so it happens after the mutex is released. The patch also sets priv->force_quit = 1 before releasing the lock, ensuring that once the interrupt handler obtains the lock, it sees the quit condition and exits promptly. This preserves the driver’s cleanup semantics while avoiding the circular wait.
This issue is similar to an earlier MCP251x deadlock fix referenced by the kernel maintainers. The earlier bug involved an interrupt occurring during mcp251x_open(). CVE-2026-23357 addresses the same kind of deadlock pattern, but specifically in the error path. That distinction matters because error paths are often less frequently exercised and can remain unnoticed longer than the normal success path. A driver might work reliably in ordinary testing but still contain a serious availability problem when hardware initialization fails, IRQ setup races with cleanup, or a device behaves unexpectedly during open.
The impact is availability. NVD’s CVSS vector records no confidentiality impact and no integrity impact, but a high availability impact. In real-world terms, that means an attacker is not expected to read protected data or modify protected data through this vulnerability. Instead, the concern is denial of service. A local user or local process with the ability to trigger the vulnerable driver path could cause the kernel or affected subsystem to hang.
The practical severity depends heavily on deployment context. On a general-purpose desktop or server that does not have MCP251x CAN hardware, does not load the mcp251x driver, and does not expose CAN device management to local users, this vulnerability may have little or no operational relevance. On an embedded gateway, vehicle test bench, industrial controller, field device, or Raspberry Pi-style system using an MCP2515 CAN module, the issue can be more important. A deadlock in a kernel driver may interrupt monitoring, telemetry, automation, diagnostics, or control-plane functions.
The vulnerability requires local access according to the CVSS vector. That does not necessarily mean an attacker needs a physical keyboard. “Local” in CVSS can include access through a local account, a compromised service, a container or workload with enough device access, a script running on the host, or a management component able to bring interfaces up and down. The key requirement is that the attacker must be able to cause the vulnerable kernel path to execute on the target system.
The “low privileges” part of the vector deserves attention. Bringing up network interfaces usually requires elevated capabilities, such as administrative permissions or specific Linux capabilities like network administration rights. However, many embedded systems run services with broad privileges. Some containerized or appliance-like deployments also delegate network or device control to services that are not fully trusted. If a lower-privileged local user can indirectly trigger interface initialization through a privileged helper, management daemon, or misconfigured service, the vulnerability becomes more reachable.
There is no user interaction requirement. Once the vulnerable local action is triggered, exploitation does not rely on another user clicking a link, opening a file, or responding to a prompt. It is a kernel synchronization issue, not a social-engineering issue.
The vulnerability is classified under CWE-667, Improper Locking. This class covers bugs where software uses locks incorrectly, resulting in race conditions, deadlocks, inconsistent state, or blocked execution. Kernel drivers are particularly sensitive to this because they often coordinate process context, interrupt context, hardware state, workqueues, network stack callbacks, and cleanup routines. A small ordering mistake can become a system-wide availability problem.
The affected CPE information published by NVD identifies multiple Linux kernel version ranges. The listed vulnerable configurations include Linux kernel 2.6.34, versions from 2.6.34.1 up to but excluding 5.10.253, versions from 5.11 up to but excluding 5.15.203, versions from 5.16 up to but excluding 6.1.167, versions from 6.2 up to but excluding 6.6.130, versions from 6.7 up to but excluding 6.12.77, versions from 6.13 up to but excluding 6.18.17, versions from 6.19 up to but excluding 6.19.7, and several 7.0 release candidates from rc1 through rc7.
Those ranges reflect upstream and stable kernel correction points, but they should not be treated as the only source of truth for every distribution. Linux distributions often backport fixes without changing the visible upstream kernel version in a way that maps cleanly to mainline version numbers. A distribution kernel may report an older base version while still containing the security fix. Conversely, a custom or vendor kernel may identify itself with a version that appears fixed but may not include the relevant backport if it diverged from upstream. For accurate exposure assessment, administrators should check the distribution or device vendor’s security advisory, package changelog, and kernel build metadata.
The presence of a vulnerable kernel version alone is not always enough to prove exploitability. The system also needs the relevant driver code to be present and usable. Administrators should determine whether the kernel was built with MCP251x support, whether the module is available, whether the module is loaded, and whether hardware or device-tree configuration exposes an MCP251x CAN interface. The relevant configuration option is commonly associated with MCP251x SPI CAN controller support. If the driver is not compiled, not installed as a module, blocked from loading, and no device uses it, practical risk is reduced.
A quick local assessment should answer several questions:
  • Is this a Linux system using CAN bus functionality?
  • Is the MCP251x or MCP25625 family of CAN controllers present?
  • Is the mcp251x driver built into the kernel or available as a loadable module?
  • Is the module currently loaded?
  • Are interfaces such as can0 backed by MCP251x hardware?
  • Can untrusted local users or services bring that interface up or down?
  • Does the installed kernel contain the upstream fix or a vendor backport?
On many embedded systems, the driver may load automatically due to device-tree overlays or board configuration. For example, a board using an MCP2515 module over SPI may expose a CAN interface at boot. In such cases, administrators should not assume that a small kernel driver CVE is irrelevant. If the board’s purpose depends on CAN communication, a driver deadlock can be operationally significant.
Symptoms of the vulnerability may resemble a system hang or stalled interface initialization rather than a clear security event. A user may attempt to bring up a CAN interface, and the command may never return. Kernel logs may show MCP251x initialization messages before the hang, but logs may not always capture the precise deadlock. Because the issue occurs in an interrupt and cleanup interaction, reproducing it may depend on timing, hardware behavior, and whether the open path enters the relevant failure branch. It may not occur every time the interface is opened.
The vulnerability is especially interesting because it lives in the error path. Error paths in drivers are notoriously difficult to test exhaustively. Normal open and close operations may be tested repeatedly, while rare failures in clock setup, transceiver handling, hardware reset, IRQ behavior, SPI transfer, or CAN controller initialization may not receive the same coverage. In this case, the driver’s cleanup path attempted to release an IRQ while still holding a mutex that the IRQ handler itself needed. That cleanup ordering is exactly the kind of situation that can pass ordinary functional testing until a specific timing window occurs.
The fix strategy is an example of a standard kernel locking principle: do not call a function that waits for interrupt completion while holding a lock that the interrupt handler may need. free_irq() can wait for an in-flight interrupt handler to complete. Therefore, if the handler may acquire a mutex, the caller should not hold that mutex while freeing the IRQ. Releasing the lock before calling free_irq() breaks the circular wait.
The priv->force_quit part of the fix is also important. Simply releasing the lock might avoid the deadlock, but the interrupt handler still needs to behave correctly during teardown. Setting a flag before releasing the lock gives the handler a clean signal that the driver is shutting down or aborting the open operation. Once the handler obtains the mutex, it can see that it should exit quickly rather than continuing normal processing against partially initialized or unwinding state. This is a common pattern in kernel driver cleanup: set state under lock, release the lock, then perform teardown operations that may sleep or wait.
For remediation, the best answer is to update the kernel to a fixed version supplied by the operating system, hardware vendor, or distribution maintainer. Based on the published fixed boundaries, systems should move to kernel releases at or beyond the corrected stable versions for their branch, such as 5.10.253, 5.15.203, 6.1.167, 6.6.130, 6.12.77, 6.18.17, or 6.19.7 where applicable. Systems tracking newer release candidates should avoid the listed vulnerable 7.0 release candidates and move to a build that includes the fix.
For enterprise and embedded fleets, patching should follow vendor guidance rather than blindly installing an upstream kernel. Many devices rely on board support packages, custom device trees, real-time patches, vendor SPI drivers, or hardware-specific kernel modifications. Replacing the kernel without vendor validation can break device functionality. The safer approach is to obtain a vendor kernel update or apply the relevant patch to the maintained kernel tree and test it against the target hardware.
If immediate patching is not possible, mitigation depends on whether the MCP251x driver is required. If the system does not need MCP251x CAN support, administrators can reduce risk by preventing the driver from loading. This may involve module blacklisting, removing unused device-tree overlays, disabling SPI-attached CAN hardware definitions, or building kernels without the driver. These mitigations should be tested carefully because disabling the driver on a system that expects a CAN interface may break applications or boot-time services.
If the driver is required, access control becomes the main interim mitigation. Limit which users and services can bring CAN interfaces up and down. Review use of Linux capabilities, especially network administration capabilities. Avoid granting broad device or network privileges to containers unless necessary. Restrict access to management APIs or helper scripts that manipulate CAN interfaces. On embedded appliances, ensure that web interfaces, diagnostic shells, update agents, and field-service tools do not expose CAN interface control to unauthenticated or weakly authenticated users.
Another defensive step is to monitor for unexpected CAN interface resets or repeated attempts to bring MCP251x interfaces up. While this vulnerability may not produce a clean exploit signature, operational logs can still reveal suspicious or unstable behavior. Repeated ip link set can0 up operations, unusual service restarts, or kernel messages around MCP251x initialization failures may be useful indicators during investigation.
Organizations should also examine container and virtualization boundaries. If a container has access to host network administration functions or CAN devices, a compromised container could potentially trigger host kernel driver paths. Containers share the host kernel, so kernel driver vulnerabilities are not contained in the same way ordinary user-space bugs may be. Systems that expose SocketCAN devices into containers should be reviewed carefully.
For Microsoft environments, the presence of this CVE in the Microsoft Security Update Guide can be confusing because the vulnerability is in the Linux kernel, not in the Windows NT kernel. The relevance may involve Microsoft’s tracking of vulnerabilities that affect Microsoft-distributed Linux components, Linux-based products, cloud images, container hosts, Azure-related offerings, or security inventory feeds. Administrators should not assume that a Windows desktop is vulnerable merely because MSRC lists the CVE. Instead, they should identify whether they operate Linux systems, Linux containers with host device access, Azure Linux images, WSL-related environments, or embedded Linux devices that include the affected kernel driver.
For Windows Subsystem for Linux specifically, practical exposure would depend on whether the WSL kernel includes the affected driver and whether the relevant hardware path is available. Most ordinary WSL installations are unlikely to be using an SPI-attached MCP251x CAN controller directly. However, organizations should still rely on Microsoft’s update channel for WSL kernel updates and avoid making broad assumptions in specialized hardware-in-the-loop or lab configurations.
The vulnerability is also a reminder that not all kernel CVEs are remotely exploitable network vulnerabilities. The CAN bus is a network technology, but this issue is not about remote packets sent over a CAN bus causing memory corruption. It is about local driver initialization and interrupt synchronization. That distinction affects prioritization. Internet-facing systems without this hardware path may treat it as low practical risk. Embedded CAN gateways, vehicle diagnostics platforms, industrial control nodes, and robotics systems may treat it as more urgent because availability of CAN communication can be central to the device’s function.
A suggested administrator response plan would look like this:
  • Inventory Linux systems that use CAN interfaces.
  • Identify systems using MCP251x or MCP25625 SPI CAN controllers.
  • Check whether the mcp251x driver is built, installed, or loaded.
  • Determine the running kernel version and vendor patch status.
  • Compare against fixed vendor advisories or stable kernel correction points.
  • Patch to a fixed kernel or apply the driver fix through the vendor kernel tree.
  • If patching is delayed, restrict local ability to manipulate CAN interfaces.
  • Disable or blacklist the driver on systems where it is not needed.
  • Test CAN interface bring-up, shutdown, and failure recovery after patching.
  • Monitor for hangs or abnormal MCP251x initialization errors.
Testing after remediation is important. Because this bug is in the open error path, a simple “does CAN traffic work?” test may not fully exercise the fixed behavior. Where possible, test interface up and down cycles, hardware absent or misconfigured conditions, IRQ behavior, SPI communication failure handling, and service restarts. Embedded systems should be tested under the same boot sequence and hardware conditions used in production.
Developers maintaining downstream kernels should review whether their trees contain local modifications to the MCP251x driver. If a vendor tree has diverged from upstream, the exact patch may not apply cleanly, but the locking rule still applies: the driver should not call free_irq() while holding mcp_lock if the IRQ handler may need that lock. The correct logic should set the quit state while protected, release the mutex, and then free the IRQ.
Security teams should score the vulnerability in context. The generic CVSS score is Medium, but environmental severity can be higher or lower. A lab workstation with no CAN hardware may have negligible exposure. A production industrial gateway where local service compromise could hang CAN communications may rate higher operationally. A safety-sensitive embedded environment should evaluate whether a denial-of-service condition in the CAN interface could affect monitoring, fallback behavior, or service continuity.
It is also worth noting that kernel CVE affected-version ranges can look surprisingly broad. That does not always mean every Linux machine from those versions is practically exploitable. It often means the vulnerable code existed across those branches until the fix was backported. Kernel configuration, module loading, hardware presence, permissions, and vendor backports all influence real exposure. For asset management, mark the CVE as applicable only where the driver and usage conditions exist, but verify patch status across the broader Linux fleet.
The safest long-term mitigation is to keep embedded and appliance kernels within supported maintenance streams. Many CAN-enabled systems run old board support package kernels for years because the hardware is stable and application changes are minimal. That creates risk when driver fixes land upstream but are not pulled into the product kernel. Even availability-only CVEs can accumulate into reliability and security debt. Vendors and operators should maintain a process for selectively backporting stable kernel fixes, especially for drivers used by the product.
CVE-2026-23357 is therefore a focused but real kernel availability issue. It does not provide remote code execution, data theft, or privilege escalation based on the available description. Its significance lies in the ability to deadlock a kernel driver through improper locking during MCP251x CAN interface open failure handling. Systems using affected Linux kernels with MCP251x CAN hardware should receive the fixed kernel update or a vendor backport. Systems that do not use the driver should consider disabling it to reduce attack surface. Access to CAN interface management should be limited, especially on shared, containerized, embedded, or remotely administered devices.
The main takeaway is straightforward: if your Linux system uses an MCP251x or MCP25625 SPI CAN controller, treat this as a relevant kernel availability fix and patch through your trusted kernel supplier. If your systems do not include that driver or hardware, document the non-applicability, continue normal kernel update hygiene, and avoid unnecessary exposure of unused kernel modules.

Source: NVD / Linux Kernel Security Update Guide - Microsoft Security Response Center
 

Back
Top