CVE-2025-68224: Guarding SCSI tagset init to fix SRCU race during UFS probe

  • Thread Author
A small change in the kernel’s block‑layer iterator code has produced an outsized operational headache: CVE‑2025‑68224 fixes a regression where a call to scsi_host_busy can race against tagset initialization, causing kernel stack traces during device probe and potentially blocking UFS platform initialization on some platforms.

Two hooded figures sprint along glowing circuitry toward a 'tag_set' data block.Background​

The Linux kernel’s blk‑mq layer and SCSI infrastructure are tightly coupled: SCSI drivers rely on blk‑mq tagsets to manage command tags and scheduling. A refactor that replaced a tags->lock with SRCU (Sleepable Read‑Copy Update) for tag iterators (commit 995412e23bb2) altered the timing and concurrency characteristics of tag iteration. That change inadvertently exposed a race: scsi_host_busy could be invoked before the SCSI host’s tag set had finished initialization, triggering SRCU read‑side activity on an uninitialized tagset and producing kernel call traces during platform driver probe sequences. Public CVE summaries and vendor advisories document the trace and the root cause precisely. This regression most commonly surfaced during UFS platform probe paths on certain SoC platforms (for example, Rockchip UFS platform bindings), where driver probe sequences and tagset setup order relied on a particular serialization. The upstream fix makes scsi_host_busy guard against uninitialized tagset operations by checking whether the tag set has been initialized before invoking tagset iterators — effectively restoring the pre‑refactor safety property at runtime.

What happened, in technical terms​

The regression chain​

  • A blk‑mq refactor replaced an explicit tags->lock with SRCU to iterate tags safely without holding a heavy lock.
  • The new iteration logic relies on tagset state being valid and its ops pointer set.
  • On certain boot/probe interleavings, scsi_host_busy was called early — before scsi_mq_setup_tags had finished initializing tag_set->ops and blk_mq_alloc_tag_set completed.
  • scsi_host_busy invoked a tagset busy iterator that entered __srcu_read_lock, operating on an uninitialized tagset and producing kernel tracebacks during platform/driver probe. The call stack observed in public reports includes blk_mq_tagset_busy_iter -> scsi_host_busy -> ufshcd_print_host_state -> ufshcd_link_startup -> ufshcd_init -> platform probe flows.

Why the check works​

The pragmatic upstream remedy is narrowly targeted: before scsi_host_busy executes tagset iteration it now verifies that the SCSI host’s tagset has been initialized (i.e., tag_set->ops has been set by scsi_mq_setup_tags. That check prevents SRCU read‑side activity against an uninitialized structure and relies on an assumption that scsi_host_busy and scsi_mq_setup_tags calls are serialized for a given host — an assumption that holds for the affected UFS driver probe sequences. The change is intentionally small to minimize regression risk and to make backporting into stable kernel branches feasible.

Who is affected and why it matters operationally​

This is a correctness/regression class bug with an operational impact — not a remote code‑execution primitive. A wrongly‑ordered probe sequence can cause kernel warnings and call traces that often stop device initialization or cause probe failures for UFS platforms, potentially leaving the platform device unbound or the root cause of failed boot paths in embedded and appliance devices. Systems that are most likely to encounter the problem include:
  • Embedded boards and appliances using UFS controllers with vendor platform drivers (for example, Rockchip UFS platform glue that probes early in boot).
  • Vendor‑supplied kernels or OEM kernel forks that incorporated the blk‑mq refactor but did not include the scsi_host_busy guard.
  • Distributions or cloud images that use kernels built from upstream trees containing the refactor prior to the fix, especially where UFS platforms are present.
Although the immediate symptom is probe‑time stack traces and failed UFS probe sequences, the practical impact may be broader: anything that depends on proper UFS initialization (boot from UFS, mounted storage devices, or devices that enumerate later based on UFS readiness) may be affected until the kernel contains the upstream guard. That makes this a higher‑priority patch for embedded vendors, appliance maintainers, and distributions that ship kernels on UFS‑equipped hardware.

The upstream fix, succinctly​

  • Detection point: scsi_host_busy was updated to first verify whether the host’s tag_set has been initialized (i.e., tag_set->ops set) before performing tagset‑iterator operations.
  • Rationale: Avoid invoking tag iteration via SRCU on an uninitialized tag set; rely on driver probe serialization guarantees where present (notably in the UFS driver).
  • Scope: Minimal, localized to SCSI core checks — preserving existing logic while preventing unsafe early calls. This approach is chosen to make backports simple and to keep regression risk low.
Because kernel probe sequences vary by platform, the patch does not attempt a heavy redesign; instead it prevents the invalid sequence from entering the SRCU iterator code path.

Detection and triage: what to look for in logs and telemetry​

Operators and maintainers should treat this as an availability/regression issue and hunt for the specific probe‑time call traces and UFS probe failures described in advisories. Practical detection guidance:
  • Kernel log signatures: look for the exact stack trace chain beginning with __srcu_read_lock and including blk_mq_tagset_busy_iter and scsi_host_busy; many public reports include these frames in the calltrace. These traces often appear during early boot or module probe sequences.
  • Failed probe messages: platform driver probe failures for UFS (ufs_rockchip_probe, ufshcd_pltfrm_init, ufshcd_init) appearing in dmesg/journalctl around system startup.
  • Device absence: missing block devices or file systems that depend on UFS device initialization after boot. This symptom may be subtle on complex images where UFS is not the root device but still required for secondary devices.
  • Reproducer: a dev/probe reproduction is environment‑specific — embed vendors often have test kernels and rootfs images that exhibit the probe ordering; reproducing the incorrect interleaving typically requires the same vendor platform code path seen in field traces.
For triage, collect full early‑boot logs (journalctl -b, dmesg) and preserve any kmsg outputs. If possible, capture the uncut dmesg around the time of platform driver registration and driver_attach events — those records help match the calltrace to a specific probe ordering.

Patching, verification and distribution mapping​

Definitive remediation is to run a kernel that includes the upstream scsi_host_busy guard. The fix was merged upstream and referenced in CVE records; vendors and distributions are expected to backport it into their stable kernels and package updates. Practical steps:
  • Inventory hosts for UFS and SCSI tagset usage:
  • Check running kernel: uname -r
  • Inspect modules: lsmod | grep -E '(^u?fs|^scsi|blk_mq)' and examine compiled‑in drivers via /boot/config-$(uname -r) for CONFIG_UFS and SCSI/blk‑mq options.
  • Map to vendor updates:
  • Consult distribution kernel changelogs and security advisories for the CVE ID (CVE‑2025‑68224) or for the upstream stable commit that added the guard; distributions will list the package versions and backports that contain the fix. Vendor advisories are authoritative for packaged kernels.
  • Apply and reboot:
  • Install the vendor kernel package that includes the fix and reboot into the updated kernel; probe‑time issues usually require reboot to fully validate.
  • Verify:
  • After boot, check dmesg/journalctl -k for absence of the earlier calltraces. Confirm that UFS devices are present and that platform probe messages show successful initialization.
Note: For embedded or appliance vendors shipping custom kernels, backporting the small upstream change is straightforward in principle, but testing across vendor hardware variants (firmware, device trees, and SoC quirks) is essential to ensure there are no unintended side effects. The kernel community deliberately kept the fix minimal to reduce backport risk.

Mitigations and short‑term workarounds​

If applying a kernel update immediately is impossible, practical mitigations are limited because this defect is a probe‑time ordering problem in the kernel — not a userland policy. Possible temporary approaches:
  • Rebuild or reconfigure init sequences (where feasible) to delay the offending platform driver’s probe until after tagset setup — this is highly platform‑specific and typically unrealistic for shipped appliances.
  • Use a vendor‑provided kernel known to avoid the blk‑mq refactor, or compile a kernel tree that includes the upstream guard. For maintainers managing fleets, staging a controlled vendor kernel rollout to groups of similar hardware is the safest corrective path.
Short‑term mitigations by restricting access or LSM policies will not prevent probe ordering races; the only durable solution is the patched kernel.

Why this change needed to be small — analysis of risk vs. reward​

The scsi_host_busy patch is an excellent example of a targeted regression fix: the upstream community preserved existing functional contracts while eliminating an unsafe early execution path by adding a simple initialization check. That approach brings important benefits:
  • Low regression surface: small, local changes are easier to reason about and backport. Upstream maintainers intentionally chose a narrow guard check to reduce risk.
  • Clear testing vectors: probe sequences and UFS driver paths provide concrete scenarios for QA teams to validate; the fix’s small size simplifies reproducer creation and test coverage.
  • Operational clarity: by checking tagset initialization, the kernel avoids invoking SRCU iterators on incomplete state — restoring robustness without large design changes.
However, there are residual caveats and risks:
  • Assumption about serialization: the fix relies on the assumption that scsi_host_busy and scsi_mq_setup_tags are serialized for the same host. While this is true for the UFS driver and typical SCSI driver probe flows, the assumption is an explicit part of the fix rationale — if other drivers or future refactors alter serialization, the assumption might break and require more structural changes. Advisories mention this assumption explicitly.
  • Vendor kernel divergence: appliance vendors or OEMs that heavily patch kernel trees might not immediately incorporate the upstream guard; the remediation status will therefore vary across distributions and product images. Operators must map upstream commits to their vendor packages carefully.
  • Hidden timing cases: small guards handle the immediate regression window, but ongoing maintenance should include careful review of probe ordering and initialization sequences in driver code to avoid similar races when future upstream refactors change concurrency primitives (locks, SRCU, RCU, etc.. Historical kernel CVEs show many subtle races arise from refactors of locking semantics.

Practical checklist for administrators and integrators​

  • Inventory and prioritize:
  • Identify hosts with UFS controllers and SCSI/blk_mq usage; prioritize devices where UFS is boot‑critical or where platform drivers are vendor‑specific.
  • Patch roadmap:
  • Check distribution advisories for CVE‑2025‑68224 and kernel package mappings.
  • Test vendor kernels containing the upstream fix in a staging environment that mirrors production hardware.
  • Roll out patched kernels in stages, monitoring dmesg and device enumeration during and after reboots.
  • Post‑patch verification:
  • Confirm successful UFS platform probe messages and the absence of __srcu_read_lock / blk_mq_tagset_busy_iter / scsi_host_busy call traces.
  • For embedded/OEM vendors:
  • Backport the minimal upstream change and subject it to hardware soak tests across all supported SoC variants and board configurations. Pay attention to device tree variants and SoC initialization order.

Broader context: why small concurrency refactors produce outsized risk​

The block‑mq refactor that replaced an explicit tags lock with SRCU illustrates a recurring theme in kernel development: changing the concurrency primitive often changes the implicit ordering and liveness assumptions in other code. Tag iteration under SRCU improves scalability and avoids heavy locking, but it also changes the lifecycle expectations for objects being iterated. Many recent kernel fixes follow the pattern of surgical guards or ordering corrections to restore previously implicit assumptions after concurrency improvements. This CVE continues that pattern — the right long‑term remedy is careful review of driver‑side initialization and explicit serialization points, but the minimal check addresses immediate instability while keeping the refactor’s scalability benefits.

Final assessment and recommendations​

CVE‑2025‑68224 is a targeted, correctness‑class regression introduced by a concurrency‑primitive refactor in blk‑mq. The operational impact is real for systems that execute the affected probe sequences (notably UFS platform probes on some SoCs), but it is not an elevation or remote code‑execution vulnerability. The upstream remedy is appropriately conservative: add an initialization check in scsi_host_busy to prevent SRCU iteration on uninitialized tag sets, making the minimal change necessary to preserve runtime safety while enabling backporting. Administrators should: prioritize patched kernels for affected hardware, validate vendor backports before broad rollout, and monitor early‑boot logs for the specific calltrace signatures described in advisories. Embedded vendors should backport the minimal fix and perform thorough hardware‑level testing across platform variants. Finally, this CVE underscores a lesson for kernel maintainers and integrators: when changing low‑level concurrency primitives, explicit initialization and serialization guarantees should be audited and documented, because implicit ordering assumptions frequently outlive the original locking strategy.
CVE metadata and the upstream commit references used to frame this article were confirmed in public CVE records and vendor advisories documenting CVE‑2025‑68224 and the related upstream stable commits. The CVE entry was published on December 16, 2025 and the upstream discussion and small guard‑patch are reflected in the kernel stable references associated with the announcement.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top