CVE-2026-23217: RISC-V Linux Kernel Ftrace Deadlock Fixed by Build Time Exclusion

  • Thread Author
A newly assigned Linux kernel vulnerability, CVE-2026-23217, exposes a subtle but serious deadlock risk on RISC‑V systems when the kernel’s function tracer (ftrace) is configured to snapshot SBI ecall functions — a situation that can hang the entire system. The fix merged into the kernel trees avoids the deadlock by preventing ftrace from instrumenting the low-level SBI ecall implementation, turning what looked like a tracing convenience into a potential denial‑of‑service hazard.

Background​

Tracing, ringbuffers and SBI: why this matters
Tracing subsystems such as ftrace are indispensable for kernel debugging, performance tuning, and post‑mortem analysis. On Linux, ftrace implements function tracing by patching or instrumenting function entry points and writing events into an in‑kernel ringbuffer that tools and userspace can read. The mechanism supports triggers like "snapshot" that take an on‑demand copy of in‑memory trace buffers at precise points in execution.
RISC‑V's Supervisor Binary Interface (SBI) is the low‑level interface used by the supervisor (kernel) to request services from machine mode (or firmware such as OpenSBI). The kernel implements SBI call helpers (sbi_ecall.c) that abstract the actual ecall exchange; these are invoked frequently for timers, interprocessor interrupts (IPI), and other platform services. When those helpers are traced as normal functions, the interactions between ftrace’s snapshot triggers, the ringbuffer, and interrupt delivery can create unusual reentrancy patterns. The new CVE documents precisely that intersection: a snapshot triggered by __sbi_ecall can cause an IPI that results in another __sbi_ecall and another snapshot, producing an endless loop and effectively deadlocking the kernel.
What the official records say
The NVD entry for CVE‑2026‑23217 summarizes the issue and points to the upstream kernel commits addressing it. Distribution advisories (Ubuntu and others) have catalogued the CVE and are in the process of evaluating and backporting fixes. The kernel patch itself is a small, targeted change to the RISC‑V Makefile that ensures sbi_ecall.c is not instrumented by ftrace when ftrace is enabled. The upstream commit was authored and signed off within the RISC‑V developer tree and was queued through the stable patch process.

What exactly went wrong: a technical walkthrough​

The reentrancy chain: snapshot → IPI → __sbi_ecall → snapshot​

The core of the bug is a reentrancy problem triggered by ftrace's snapshot action. When a function is registered with a snapshot command in set_ftrace_filter (for example, echo "__sbi_ecall:snapshot" > set_ftrace_filter), ftrace will invoke snapshot logic whenever that function runs. The snapshot procedure raises an interprocessor interrupt (IPI) to coordinate or copy the ringbuffer state across CPUs; on RISC‑V, handling that IPI involves an SBI call in certain configurations, which leads back into __sbi_ecall. That second invocation then causes another snapshot, and so on — a self‑sustaining loop that does not exit, leaving the kernel in a locked state.

Why timers and the lack of SSTC make it easy to hit​

On many RISC‑V systems that do not implement the SSTC extension, periodic clock events (timer callbacks) invoke SBI ecalls routinely. That means the conditions required to enter the snapshot → IPI → __sbi_ecall cycle can happen simply as part of normal timer activity, without any exotic workload or deliberate crafted sequence. In short: the behavior is easy to reproduce on affected hardware if ftrace has been told to snapshot SBI ecall functions.

The patch: build‑time exclusion of sbi_ecall from function instrumentation​

The change applied in the kernel is surgical and conservative: it modifies arch/riscv/kernel/Makefile so that when CONFIG_FTRACE is enabled, the object sbi_ecall.o is removed from ftrace instrumentation at build time (using CFLAGS_REMOVE to strip the mcount/mcount-like instrumentation). This means the function tracer will not insert callbacks into the sbi_eacl implementation, preventing the snapshot trigger from ever being reached by function tracing. The change does not alter __sbi_ecall runtime behavior except that it will no longer be instrumented for function tracing hooks; SBI events can still be observed using trace events.

Impact analysis​

Who is affected?
  • RISC‑V systems running kernels where ftrace is enabled and where sbi_ecall.c is not excluded by other config switches (notably, CONFIG_RISCV_ALTERNATIVE_EARLY). Systems with those conditions are susceptible.
  • Distributions that ship RISC‑V kernels with function tracing available will need to evaluate kernels and, where necessary, apply the patch or backport equivalent changes. Ubuntu has already published an advisory entry and assigned a medium priority for tracking.
Severity and exploitability
  • The practical effect is a denial of service: a local or privileged user action that enables function tracing on specific SBI functions can hang the kernel completely. The kernel itself will deadlock; there is no indication this leads to remote code execution. Multiple public vulnerability trackers classify this as an information/DoS issue rather than arbitrary code execution.
  • Importantly, the attack surface is constrained by ftrace's access controls. Interacting with set_ftrace_filter and related tracefs controls typically requires root or equivalent administrative privileges on the system. That reduces the risk of remote exploitation, but does not remove the severity for multi‑tenant environments and development setups where users or processes may have tracing capabilities or privileged access. Kernel tracing controls are powerful and therefore restricted; however, system misconfiguration, container escapes, or elevated developer access can create paths to abuse. The kernel documentation and common practice show ftrace controls are administrative operations.
Operational costs and developer impact
  • The fix reduces one tracing capability (function instrumentation of sbi_ecall). For kernel developers who rely on that exact instrumentation, the trade‑off may be inconvenient. The kernel maintainers explicitly note that SBI ecalls are still loggable via trace events, which remain usable alternatives for debugging. Because the change is a build‑time exclusion, the actual runtime behavior and performance characteristics of SBI remain unchanged beyond the absent function‑tracing hooks.

Mitigation and remediation: practical steps for system owners​

Immediate mitigations (if you cannot patch immediately)
  • Avoid enabling function tracing for SBI ecall functions. Do not echo "__sbi_ecall:snapshot" (or similar) into set_ftrace_filter. Where possible, maintain a policy of excluding low‑level syscall/firmware function families from function tracing.
  • Use trace events instead of function tracing for SBI activity. Tracepoints and event-based tracing provide visibility with less reentrancy risk and are recommended by the kernel maintainers for this class of low‑level operations.
  • Restrict tracing to trusted administrative users. Confirm that only genuine administrators have write access to tracefs control files, and audit who can mount or alter tracefs / debugfs on your systems. This is standard hardening for production RISC‑V hosts.
Patch and update guidance
  • Identify affected kernels:
  • Check whether the kernel was built with CONFIG_FTRACE enabled and whether CONFIG_RISCV_ALTERNATIVE_EARLY affects your setup.
  • Review your distribution's kernel changelog for the commit that adds unconditional removal of sbi_ecall from ftrace instrumentation. The upstream commit identifier and patch discussion were recorded in the stable patch lists.
  • Apply vendor updates:
  • For packaged kernels (distribution kernels), watch your vendor advisories and apply the vendor-supplied kernel updates as soon as they are available and tested in your environment. Ubuntu and other vendors are tracking the CVE and evaluating backports.
  • For custom or embedded builds:
  • Rebuild your kernel with the updated arch/riscv/kernel/Makefile changes (or backport the specific CFLAGS_REMOVE change) and redeploy. The change is small but must be deployed to any kernels that ship on RISC‑V hardware in your fleet.
Detection and validation
  • After patching or applying a workaround, verify that function tracing does not instrument sbi_ecall by inspecting available_filter_functions and enabled_functions, and ensure that set_ftrace_filter will not accept SBI ecall names. Validate by attempting to add a snapshot trigger in a safe, controlled test environment rather than production. The kernel tracing documentation and available_filter_functions listing are the authoritative interfaces for these checks.

Developer and maintainer perspective: trade‑offs and engineering judgement​

Why a build‑time exclusion is the chosen fix
The selected fix is pragmatic: rather than redesigning snapshot/I PI handling or making complex special cases in snapshot code, the maintainers have prevented instrumentation of a small, well‑defined set of low‑level functions that should not require the profiling semantics of function tracing. This keeps the core tracing and ringbuffer code simpler and avoids subtle reentrancy problems that are expensive to reason about and test across all hardware configurations. The approach aligns with conservative kernel engineering: avoid adding tracing hooks into paths that interact with interrupt or firmware services.
Downsides worth noting
  • Visibility loss: Developers who once relied on function trace hooks to debug SBI interactions may need to adapt their tooling and use trace events or other instrumented points.
  • Backport effort: Embedded RISC‑V vendors and long‑term support distributions may need to backport the Makefile modification; while the patch is small, differences in tree layout across kernel versions can add friction. The stable kernel reviewers have signaled that the change is appropriate for backporting, but operational teams should still validate the patch in their kernels.
Opportunities for tooling and process improvement
This incident underscores the importance of trace surface hygiene: tracing systems should maintain safe defaults and provide clear guidance for what can be instrumented. Vendors and integrators should:
  • Document functions and subsystems that are unsafe to instrument at function granularity.
  • Offer packaged profiles for tracing that exclude fragile subsystems by default.
  • Encourage the use of event tracing for low‑level firmware interactions where possible.

Broader implications: tracing, RISC‑V adoption, and hazard classes​

Tracing is a double‑edged sword: it can reveal hidden behavior but also create subtle hazards when it interacts with low‑level platform services. The CVE demonstrates how kernel telemetry and firmware contracts can collide — especially on newer architectures like RISC‑V where the ecosystem is evolving rapidly.
For RISC‑V adoption in cloud, edge, and embedded contexts, this event has three practical lessons:
  • Security posture for tracing interfaces matters. Production images should lock down tracing capabilities or supply hardened, well‑tested tracing profiles.
  • Hardware/firmware differences influence software safety. Extensions like SSTC can alter whether timer code invokes SBI ecalls and therefore influence exploitability and reproducibility; this variability must be part of risk assessments for platform support.
  • Rapid upstream fixes are possible and effective when the change is narrow. The kernel community applied a small, well‑reviewed Makefile change rather than a large redesign — a favorable outcome for stability and maintainability.

Verification and sources check​

What we verified
  • The technical description of the issue (snapshot reentrancy through __sbi_ecall causing deadlock) is documented in the upstream commit and in the NVD/CVE records. The kernel patch text and stable patch email threads make the cause, reasoning, and specific code‑level remedy clear.
  • Distribution tracking (Ubuntu) reflects that vendors are evaluating the CVE and planning updates/backports for their RISC‑V kernels. That advisory also classifies the issue and provides guidance on status tracking.
  • Kernel documentation about ftrace semantics and the snapshot command confirms the mechanism that allows a function to trigger snapshot behavior; this supports the explanation of how a traced function can cause snapshot logic to run.
  • Attempts to access the Microsoft Security Response Center page for this CVE may require client‑side JavaScript and rendering; the MSRC update guide entry referenced by some users appears not to present easily as a static page without JavaScript, so native crawling may show placeholder content. Administrators relying on vendor portals should confirm advisories via vendor dashboards or distribution security trackers. (msrc.microsoft.com)
Caveats and unverifiable claims
  • There is no public evidence that this CVE has been weaponized in the wild to cause mass disruption; the record and upstream commit describe a vulnerability class and a deterministic way to trigger it under specific tracing conditions. Public trackers label the issue as an information/DoS type of vulnerability and list the kernel commits and advisories, but no proof of exploit chains beyond the described trigger is present in the public record. If new exploit reports appear, operators should reassess priorities.

Recommendations — concise checklist for operators​

  • Inventory: Identify all RISC‑V hosts and check whether their kernels were built with CONFIG_FTRACE enabled. Confirm whether CONFIG_RISCV_ALTERNATIVE_EARLY affects your configuration.
  • Patch: Prioritize applying vendor or upstream kernel updates that include the Makefile exclusion change. Test patched kernels in staging before rolling to production.
  • Harden: Restrict tracefs/debugfs mounts and permissions to a minimal set of administrators; audit who can write to tracing controls.
  • Avoid risky traces: Do not enable function tracing for low‑level SBI helpers; use tracepoints and event‑based instrumentation instead.
  • Monitor: Watch distribution advisories, the NVD entry, and upstream stable patch notes for backports and related fixes. Validate kernels after upgrades to ensure sbi_ecall functions are excluded from function tracing.

Conclusion​

CVE‑2026‑23217 is a textbook example of how powerful kernel features — implemented to help engineers diagnose and understand system behavior — can unintentionally create system hazards when they intersect with low‑level platform services and interrupt semantics. The Linux kernel maintainers applied a narrow, practical fix: remove ftrace instrumentation from sbi_ecall at build time to prevent the snapshot → IPI → __sbi_ecall reentrancy loop that causes a deadlock. For operators and developers running RISC‑V kernels, the priority is clear: avoid tracing these functions, apply vendor updates, and constrain who can manipulate tracing controls. The incident also reminds the community that tracing defaults and safe‑instrumentation guidelines should be part of platform hardening as RISC‑V matures in production deployments.

Source: MSRC Security Update Guide - Microsoft Security Response Center