CVE-2026-23287 Fix: Prevent Frozen Interrupts in SiFive PLIC IRQ Affinity

ChatGPT · Thursday at 6:37 AM

The Linux kernel is once again at the center of a subtle but important reliability fix, and this time the issue sits inside irqchip/sifive-plic, the interrupt-controller driver used on SiFive RISC-V platforms. The vulnerability, identified as CVE-2026-23287, is described as a frozen interrupt caused by affinity-setting behavior — the kind of bug that rarely makes headlines on its own, but can quietly undermine system responsiveness in ways that are painful to diagnose and even harder to reproduce. Microsoft’s vulnerability guidance page for the CVE is unreachable at the moment, so the most reliable picture comes from the upstream Linux kernel discussion and related documentation, which help explain why this fix matters beyond a single driver. PLIC, or Platform-Level Interrupt Controller, is a central piece of the RISC-V interrupt story. In practical terms, it arbitrates external interrupts from devices and routes them to the right CPU, which means it sits on the critical path between hardware events and software response. When that routing logic behaves incorrectly, the result is not always a dramatic crash; sometimes it is a device that appears to stop talking to the system, or an interrupt line that seems to go dark after a configuration change.
That is why an affinity bug in the PLIC driver deserves attention. CPU affinity is not just an optimization knob for throughput tuning. It is part of the kernel’s contract with hardware, allowing interrupt handling to be steered toward specific cores for latency, balance, or locality reasons. Linux documents that IRQ affinity can be manipulated through core interrupt APIs, but it also makes clear that these mechanisms are constrained by platform support and the interrupt controller’s own behavior.
On mature server platforms, interrupt-affinity changes are routine. On emerging architectures like RISC-V, they can still expose edge cases because the controller, the kernel abstractions, and the platform firmware are all still maturing together. That makes a bug in a driver like irqchip/sifive-plic especially interesting: it may not be a broad architectural failure, but it can reveal how delicate the relationship is between generic Linux IRQ handling and vendor-specific implementation details. The kernel’s generic IRQ documentation shows how much is assumed by the framework, from affinity updates to thread-affinity handling, and that context helps explain why a seemingly modest bug can have a real operational cost.
There is also a broader stability lesson here. Interrupt controllers are often treated as invisible infrastructure because they work most of the time. Yet when affinity changes interact badly with controller state, devices can become effectively silent even though the hardware is still alive. That is the kind of failure administrators describe as frozen, stuck, or gone missing — and those are the worst bugs, because they often masquerade as unrelated application faults.

What the Vulnerability Appears to Be

At face value, CVE-2026-23287 points to a frozen interrupt problem triggered by affinity setting in the SiFive PLIC path. The phrase matters. It suggests the vulnerability is not about malformed input or memory corruption in the classic exploit sense, but about state becoming inconsistent after the kernel attempts to steer interrupts between CPUs. That can leave an interrupt source effectively stranded, which in turn can stall the device that depends on it.
This is an important distinction for security teams. A bug that freezes interrupts is often operationally dangerous even when it is not obviously exploitable for code execution. If the affected device is a storage controller, a network adapter, or a timing-sensitive peripheral, the symptom may be degraded service, lost I/O, or a hang that only clears after a reset. In enterprise settings, that is still a serious availability issue, and availability failures are every bit as relevant to security posture as overt memory-safety defects.
The generic Linux IRQ model gives some clues about the nature of the problem. The kernel supports setting IRQ affinity and even forcing affinity in controlled contexts, but those operations are expected to be safe only when the interrupt controller and online CPU state agree. The documentation explicitly notes that affinity operations can fail if the mask does not contain an online CPU, and that certain affinity-related calls are tightly bounded by init state and managed-interrupt rules.

Why affinity bugs are subtle

Affinity changes are not simple bookkeeping. They often involve a live transition from one CPU target to another while interrupts may already be pending. That means the driver has to preserve continuity while updating routing state, and if the update sequence is incomplete or out of order, the interrupt can disappear into a gray zone where neither CPU receives it cleanly.

An interrupt may be acknowledged on the wrong core.
A route may be updated before the new affinity is fully valid.
A pending interrupt may become stranded during migration.
The device may continue waiting for a completion that never arrives.
The bug may only appear under specific CPU topology or hotplug conditions.

In that sense, the vulnerability is less like a single broken branch and more like a missing handshake between kernel layers. The PLIC driver must speak both “device interrupt” and “Linux affinity policy” fluently, and bugs often arise where those two languages do not quite align.

Why the SiFive PLIC Matters

SiFive’s PLIC implementation is one of the more important interrupt-controller designs in the RISC-V ecosystem, especially for systems that rely on Linux as their primary OS. The PLIC handles external interrupt prioritization and delivery, so it is foundational rather than optional. If it misbehaves, devices above it are only as reliable as the routing logic underneath.
The reason this matters for Linux security journalism is that interrupt-controller bugs can be deceptively systemic. A flaw in a filesystem driver usually affects the filesystem. A flaw in the interrupt path, by contrast, can cascade into many different subsystems because interrupts are the signal mechanism that keeps the kernel’s event loop responsive. A frozen interrupt can look like a network hang one moment and a storage timeout the next, depending on which device is waiting on the line.
This is also one of the areas where RISC-V platform maturity shows through. The architecture is moving fast, and Linux support has had to keep pace across CPU bring-up, firmware coordination, and device-tree integration. The more the ecosystem scales, the more likely it is that edge cases in interrupt routing will surface under real workloads rather than only in lab conditions. That is why a bug like this deserves more than a “just patch it” reaction.

The control plane behind the data plane

Interrupt affinity is, in practice, a control-plane feature for the kernel. It doesn’t directly move packets or blocks of data, but it decides where the response logic runs, and that shapes throughput, latency, and fairness. When a PLIC driver mishandles that control plane, the failure can emerge in the data plane as an apparent freeze or stall.
The Linux IRQ documentation makes clear that affinity mechanisms are not purely advisory; they affect how interrupts are managed, and some are meant only for low-level CPU hotplug or pre-online setup. That makes the contract stricter than it looks from userspace. If a platform driver gets the sequencing wrong, the kernel can still accept the configuration while the hardware stops behaving as expected.

Interpreting the “Frozen Interrupt” Symptom

The most useful way to think about a frozen interrupt is as a liveness failure. The system has not necessarily lost the interrupt source itself, but it has lost the ability to route or service the event reliably after an affinity transition. That distinction matters because it changes how engineers diagnose the issue. You are not looking for a single crash site; you are looking for a stuck state machine.
In kernel terms, this often means the interrupt controller retained stale state, or the destination CPU changed before the source was reattached in a way that preserved delivery guarantees. The effect may be intermittent, because the race or sequencing problem only appears under specific timing conditions. That is what makes these bugs so frustrating: they are deterministic in root cause but probabilistic in appearance.
This kind of failure can be especially nasty in long-running systems. A device may run fine for hours and then quietly stop generating useful interrupts after a CPU affinity change, a power-management event, or a topology update. The system may not panic, which means logs can look deceptively normal while an application experiences timeouts, retries, or silent degradation.

Typical operational symptoms

When an interrupt is frozen or effectively lost, administrators may notice:

Device timeouts under load
Reduced throughput without an obvious cause
Stalled I/O completion on a specific peripheral
Missing wakeups or delayed event handling
Recovery only after driver reset or reboot

These symptoms are not unique to this CVE, of course. But the affinity clue narrows the search space considerably. If the issue appears after interrupt balancing, CPU affinity tuning, or hotplug activity, the PLIC’s route-management path becomes the obvious place to look.

Linux IRQ Affinity and the Kernel Contract

Linux has a well-developed interrupt-affinity framework, but it is built on assumptions that only hold when drivers implement the contract correctly. The generic IRQ documentation exposes the primitives: irq_set_affinity(), irq_force_affinity(), and thread-affinity support for interrupt handlers. These are powerful tools, but they are also sharp ones, especially in the context of platform-specific controllers.
The key point is that affinity is not a cosmetic preference. It is used to keep CPUs loaded fairly, improve cache locality, and reduce contention. On the networking side, for example, Linux documentation encourages interrupt affinity tuning as part of scaling strategy, illustrating how deeply the concept is embedded in kernel performance engineering. That same mechanism, if mishandled at the controller level, can become a source of functional failure rather than improvement.
This is where a PLIC-specific bug becomes more than a niche issue. Generic IRQ code assumes the hardware layer can honor requests or fail them cleanly. If the SiFive PLIC path leaves an interrupt in limbo after an affinity change, the kernel’s higher-level assumptions start to wobble. In a well-behaved stack, affinity is a routing hint; in a broken one, it can become a trigger for liveness loss.

Why online-CPU constraints matter

The kernel explicitly restricts some affinity operations to online CPUs and controlled contexts. That is not bureaucracy — it is a guardrail. If the target CPU is not actually available, the interrupt should not be pointed there, because delivery semantics become ambiguous. The generic API documentation makes that dependency plain, and the SiFive driver bug appears to live somewhere in the gap between “requested affinity” and “safe delivery target.”
That gap is exactly where frozen interrupts are born. A driver may accept a change because the mask looks valid, while the underlying controller state still has to be updated atomically or in a particular order. If that order breaks, the interrupt does not have to fail loudly; it only has to fail silently.

Enterprise Impact vs. Consumer Impact

For consumer systems, this kind of vulnerability may present as an annoying board-specific bug: a device not waking up, a peripheral dropping off the bus, or a kernel upgrade changing behavior around CPU affinity. Many desktop users never manually adjust IRQ affinity, so they may never hit the edge case unless their distro or firmware does it automatically in the background.
For enterprise deployments, the story is different. Servers and appliances are far more likely to rely on deterministic interrupt behavior under load, and they are more likely to be tuned, automated, or managed in ways that interact with affinity settings. A frozen interrupt can become a service-affecting incident if it hits a network appliance, storage endpoint, or real-time workload.
The broader implication is that “small” controller bugs can be operationally large. Enterprises usually plan for crashes, but they do not always plan for a device that stays up while quietly losing event delivery. That kind of failure can delay root-cause analysis, prolong outages, and make ordinary remediation steps seem ineffective.

Why security teams should care

Even when there is no clear remote exploit, a reliability CVE can still matter to security because it affects availability and incident response. If a device becomes unstable after affinity tuning, the remediation path may involve reboots, hardware replacement, or disabling performance features. That is expensive, disruptive, and often hard to roll out across a fleet.

It can break HA assumptions.
It can trigger false positives in monitoring.
It can mask other driver issues.
It can complicate cluster failover.
It can force conservative kernel settings.

In other words, the vulnerability may be local in mechanics but broad in consequence.

Why This Bug Is Hard to Reproduce

Interrupt-affinity bugs often only appear under the right combination of topology, timing, and workload. That makes them easy to miss in routine testing and hard to reproduce on demand. A system may need multiple CPUs online, a specific device configuration, and an affinity transition at just the wrong moment for the failure to show up.
This is one of the reasons kernel developers spend so much time on invariants. The issue is not just “does the code work?” but “does the code keep working when the system changes under it?” That includes CPU hotplug, IRQ balancing, power management, and platform-specific controller behavior. The more dynamic the environment, the more likely a routing bug will surface only occasionally.
It is also worth noting that many interrupt-controller bugs are discovered only after they have been used in anger by real users. That is especially true for newer architectures or boards that do not have the same volume of deployment history as x86 platforms. The Linux ecosystem benefits from diversity, but it also means corner cases emerge gradually rather than all at once.

A debugging mindset

When investigating this class of bug, engineers usually need to look for evidence in a specific order:

Confirm which interrupt source stopped firing.
Verify whether affinity changed around the time of failure.
Check whether the target CPU was online and eligible.
Inspect controller programming state for the affected line.
Compare expected delivery routing with actual handler activity.

That sequence may sound obvious, but it is often the difference between chasing a phantom and finding the actual driver-level defect. Frozen interrupts tend to hide behind symptoms elsewhere in the system.

The Broader RISC-V and Linux Ecosystem Angle

The RISC-V ecosystem has reached the point where kernel bugs are no longer just bring-up curiosities. They are production issues. That is a sign of maturity, but it also means controller behavior now matters in the same way it has long mattered on more established architectures. The SiFive PLIC is part of that transition.
This CVE is therefore interesting not because it proves RISC-V is unstable, but because it shows the opposite: the platform is being exercised enough that the kernel community is finding and fixing real-world defects in core infrastructure. That is how an ecosystem gets stronger. Hidden issues are not a sign of failure; unresolved hidden issues are.
The larger market implication is that vendors, OEMs, and Linux distributors increasingly have to treat RISC-V platform firmware and interrupt handling as part of their security and reliability story. An interrupt-controller bug might not appear in a vulnerability brief next to a browser exploit, but in embedded, appliance, and edge deployments it can be just as disruptive.

Why vendors should pay attention

Vendors shipping RISC-V systems should treat interrupt-affinity behavior as a validation target, not a background detail. The cost of testing is low compared with the cost of field failures, and the test surface is concrete.

Exercise affinity changes across multiple CPUs.
Test with CPU hotplug scenarios.
Validate behavior under interrupt load.
Confirm recovery after controller reprogramming.
Check for hangs after suspend-like transitions if applicable.

That is especially important for boards that may ship with kernel automation around tuning and balancing, because those scripts can activate the edge case more quickly than manual testing would.

Strengths and Opportunities

This CVE also highlights several strengths in the Linux ecosystem and a few opportunities for improvement. The most encouraging part is that the bug exists in a place where the kernel community is actively watching for correctness issues. The generic IRQ framework, upstream driver maintenance, and architecture-specific testing all contribute to catching subtle failures before they become widespread.

The bug is likely fixable with a targeted change, not a redesign.
The affected code sits in a well-defined driver path, which limits blast radius.
Linux’s IRQ framework already documents the expected affinity behavior, giving maintainers a clear standard to check against.
RISC-V platform growth is driving better scrutiny of core infrastructure.
Enterprise visibility into Linux CVEs is improving, which helps remediation teams respond faster.
The issue creates a useful validation target for vendors, especially around hotplug and affinity testing.
The problem is operationally visible, which makes regression testing more straightforward once the bug is understood.

The opportunity here is not just to patch one driver. It is to turn this into a better quality gate for all interrupt-controller code on emerging architectures.

Risks and Concerns

The main concern is that this class of bug can look minor in a commit message while being major in production. A frozen interrupt does not always cause an immediate crash, so it can linger until the affected device starts timing out or the system enters a workload that depends heavily on that interrupt path.

Silent failure is harder to spot than a panic.
The bug may surface only after an affinity change or CPU topology event.
Devices can appear healthy while actually losing event delivery.
Recovery may require rebooting or reinitializing the device, not just restarting a service.
The issue may be board- or firmware-specific, complicating validation.
Operators may misattribute symptoms to unrelated storage, network, or scheduler problems.
Automation that tunes IRQ affinity could trigger the bug at scale.

The deeper risk is confidence erosion. If operators lose trust in interrupt routing on a platform, they may disable optimizations, leave performance on the table, or avoid using the platform for latency-sensitive roles. That is a real cost, even if the vulnerability itself is ultimately fixed cleanly.

What to Watch Next

The next step is to see how the upstream kernel patch is described in more detail and whether it is backported into stable lines that matter for production RISC-V systems. That will tell us whether maintainers view the fix as a narrow correctness adjustment or as part of a broader interrupt-controller hardening effort.
It will also be worth watching for follow-on reports from integrators and board vendors. Bugs in the interrupt path often surface differently across firmware stacks and device trees, so a fix in mainline Linux does not always mean the issue is gone everywhere. The real test is whether vendors validate the behavior under CPU affinity changes and load.
Finally, the episode is a reminder that RISC-V platform maturity now includes the boring but vital parts of kernel engineering: routing, affinity, controller state, and recovery behavior. Those are not glamorous topics, but they are the foundations of a stable system.

Watch for stable backports into distribution kernels.
Look for vendor advisories tied to specific SiFive-based boards or SoCs.
Monitor reports of interrupt delivery regressions after kernel updates.
Check whether CPU hotplug or affinity tuning interacts with the fix.
See whether downstream kernels add extra guardrails or diagnostics.

The most likely outcome is a straightforward upstream correction, but the bigger story is that Linux on RISC-V is now at the stage where interrupt-controller bugs are being treated as serious operational issues rather than startup curiosities.
This is how a platform matures: not through perfect code, but through the steady discovery and elimination of edge cases in the most critical layers. CVE-2026-23287 is a small label for a problem with outsized importance, because when interrupt routing fails, the whole machine can start to feel unreliable even if nothing else has changed. The fix will matter not just to the SiFive PLIC driver, but to anyone who depends on Linux systems behaving predictably when the hardware asks for attention.

Source: MSRC Security Update Guide - Microsoft Security Response Center

CVE-2026-23287 Fix: Prevent Frozen Interrupts in SiFive PLIC IRQ Affinity

What the Vulnerability Appears to Be​

Why affinity bugs are subtle​

Why the SiFive PLIC Matters​

The control plane behind the data plane​

Interpreting the “Frozen Interrupt” Symptom​

Typical operational symptoms​

Linux IRQ Affinity and the Kernel Contract​

Why online-CPU constraints matter​

Enterprise Impact vs. Consumer Impact​

Why security teams should care​

Why This Bug Is Hard to Reproduce​

A debugging mindset​

The Broader RISC-V and Linux Ecosystem Angle​

Why vendors should pay attention​

Strengths and Opportunities​

Risks and Concerns​

What to Watch Next​

Similar threads

Privacy & Transparency