A high-risk flaw in the Linux kernel’s software RAID code, tracked as CVE-2024-35808, has been quietly fixed upstream and pushed into vendor updates: the md/dm-raid driver called md_reap_sync_thread from raid_message without the necessary synchronization, creating a window where the kernel’s RAID sync thread state could be manipulated unsafely and an attacker with local access could cause a denial-of-service condition against RAID-managed storage. Patches replace the unsafe direct call with the safer stop_sync_thread helper (the same approach used by md/raid), and major distributions have issued kernel updates; system administrators should treat this as a local-access denial-of-service risk and prioritize kernel updates or mitigations on hosts using software RAID and device-mapper RAID (dm-raid).
Background / Overview
The Linux kernel’s md (multiple devices) and device-mapper RAID (dm-raid) subsystems implement software RAID, providing b-level redundancy, resynchronization and reshape operations to protect data. These subsystems coordinate a background
sync thread that performs parity rebuilds, resyncs, and other reconstruction tasks; the thread’s lifecycle and internal state are protected by locking primitives, notably a mutex commonly referred to as
reconfig_mutex.
CVE-2024-35808 arises from an unsafe code path in the dm-raid implementation where code in raid_message invoked
md_reap_sync_thread directly while not holding
reconfig_mutex. Because md_reap_sync_thread can modify fields that are expected to be protected by reconfig_mutex, calling it without the proper mutual exclusion produces a race condition between m-handling code and sync-thread management. Attempts to solve this by simply acquiring reconfig_mutex were problematic in turn: holding the mutex in that context can trigger deadlocks, as earlier kernel refactors demonstrated.
The pragmatic upstream fix replaced the direct call to md_reap_sync_thread with the exported helper
stop_sync_thread, which unregisters and stops the sync thread using a safer sequence designed to avoid the locking and deadlock problems. Vendors integrated the upstream corrections into stable kernels and released distribution-specific kernel packages to remediate the vulnerability.
What happened (technical summary)
- The problem: raid_message in the dm-raid driver could call md_reap_sync_thread without holding reconfig_mutex, creating unsynchronized mutation of sync-thread-controlled fields.
- Why this matters: md_reap_sync_thread can alter internal mddev/md structures; without the mutex those changes can race with other code paths and cause inconsistent state, hangs, or other availability problems.
- Why the easy fix didn’t work: simply grabbing reconfig_mutex inside raid_message introduces a deadlock risk that upstream kernel developers had already addressed in prior commits—those prior changes reworked the sync-thread / freeze/unfreeze semantics to avoid deadlocks.
- The chosen fix: replace direct md_reap_sync_thread calls with stop_sync_thread, a safer exported helper that unregisters the sync thread in a way that avoids the deadlock-prone sequence while still preventing concurrent access to the same fields.
This solution mirrors how the md/raid implementation managed the same situation and aligns dm-raid with md/raid’s safer pattern.
Why this is a Denial-of-Service (DoS) risk
At its core CVE-2024-35808 is a synchronization bug that affects availability more than confidentiality or integrity. The primary risk scenarios are:
- Race-induced hangs: unsynchronized state changes can cause kernel threads to block indefinitely waiting on conditions that will never be satisfied, producing a sustained outage for I/O on affected arrays.
- Repeated exploitation: an attacker who can trigger the vulnerable message handler repeatedly may be able to repeatedly drive the kernel into the unsafe state, producing repeated or persistent denial-of-service conditions.
- Side effects on new connections: even if existing I/O survives, new I/O could be prevented while the md/dm layer is stuck or reconfiguring, denying access to block devices or higher-level filesystems.
Most publicly available assessments and vendor advisories classify the impact as
availability-only (no confidentiality or integrity loss inherent to the flaw), but with
high availability impact — the kernel can become unable to service storage I/O managed by md/dm-raid. The attack vector is
local; an attacker needs local access to trigger the vulnerable code paths (for example, a low-privileged local user or a process running inside a container on the host).
Affected systems and kernel ranges
Vendor aggregation and open vulnerability records identify a broad range of kernel versions where the defect existed. In practice, affected kernels include long-standing stable releases and newer series around the time the faulty code paths existed. Distribution advisories and upstream records indicate the vulnerability was introduced before a set of fixes that landed in stable trees; typical affected ranges reported by vendors and vulnerability databases include:
- Kernels in the general lineage up to (but not including) the upstream fixes — practical reporting shows series such as 3.10 up to mid-6.x release points were affected, and specific 6.8 series kernels between 6.8.0 and early 6.8.x releases were also impacted until fixes were backported.
- Distribution-specific kernels: vendors have listed specific kernel package versions that fix the issue for particular releases (for example, some Ubuntu kernels in the 6.8.* packaging were marked fixed at a particular package release).
Because distributions apply fixes to their own kernel trees and often backport patches to stable kernels, the
precise affected kernel packages vary by vendor and release. Administrators must consult their distribution’s security advisory or package changelog for the exact fixed kernel build for their release.
Note: if you deploy vendor-supplied kernels (Red Hat, SUSE, Ubuntu, Debian, Oracle Linux, etc. rely on the vendor advisory for the authoritative fixed package name and version; generic upstream kernel version numbers are helpful but distribution packaging and backports change the concrete upgrade action.
Exploitability, prerequisites, and real-world risk
- Attack vector: Local — an attacker must be able to execute code or trigger message handling on the host. This includes low-privileged users, untrusted local processes, or containerized workloads with sufficient access to device-mapper or md control interfaces.
- Privilege required: Low — many assessments indicate low privileges are sufficient to trigger the condition, though access to md/dm control paths or device files is required.
- Complexity: Low — the race is a straightforward misuse of a public helper from a message handler.
- Scope: Unchanged — the issue does not change scope to other security domains (it remains within kernel md/dm RAID management).
- Impact: Availability — High — the core consequence is denial of service for RAID-managed storage.
There are no widely reported, authoritative public proofs-of-concept demonstrating remote exploitation in the wild. Because exploitation requires local access, this vulnerability is primarily a concern in multi-tenant systems, shared hosts, hosting providers, and container platforms where untrusted workloads might trigger kernel message handlers. Where local userspace can call into the device-mapper/raid control paths, an attacker could selectively try to trigger the condition. Administrators should treat evidence of in-the-wild exploitation as
unknown unless a vendor or incident response team publishes a confirmed campaign.
What the patch changes (developer-level detail)
The upstream changes that fix CVE-2024-35808 remove direct calls to
md_reap_sync_thread from dm-raid’s raid_message, and use the exported
stop_sync_thread helper instead. The reasoning and advantages are:
- stop_sync_thread encapsulates the correct unregister/stop sequence for the sync thread, ensuring that waiting and wake-up semantics are handled consistently and that code paths that might otherwise race on protected fields do not run concurrently.
- The change avoids forcing a reconfiguration mutex acquisition inside the message handler, which earlier attempts to fix similar issues had shown could introduce deadlocks.
- The patch mirrors existing md/raid logic, aligning dm-raid with the established pattern and minimizing the risk of regressions caused by divergent implementations.
At a practical level the patch replaces a small set of calls and adds checks around the state transitions so that a sync-thread unregister is done via the exported helper with proper wakeups and wait semantics.
Detection and how to check your systems
Administrators should perform a quick audit to determine exposure:
- Check kernel version:
- Run: uname -r
- Compare the running kernel to your vendor’s list of fixed kernels for CVE-2024-35808.
- Check for presence of MD or device-mapper RAID usage:
- cat /proc/mdstat — shows active md arrays.
- lsmod | grep '^md' and lsmod | grep dm_raid — check loaded md/dm modules.
- For device-mapper DRM/RAID: check device-mapper mappings (dmsetup ls).
- Check distribution security advisories:
- Use your distro’s security tracker or CVE pages to map the vulnerability to a fixed kernel package for your release.
- Monitor kernel logs for related hang symptoms:
- dmesg/tail -f /var/log/kern.log — look for md/dm warnings, repeated sync-thread wakeups, or unexpected md stop/registration messages.
- Container and VM hosts:
- For hypervisor hosts or container runtimes where untrusted containers run, check whether containers have access to device-mapper control, udev devices, or the /dev/md* device nodes.
If your kernel package is older than the vendor’s fixed package for CVE-2024-35808 and you have any md/dm-raid arrays or dm-raid usage, treat the host as vulnerable.
Mitigation and remediation guidance
The primary and recommended remediation is to patch the kernel with the vendor-supplied update that contains the upstream fix. Where immediate patching is not possible, consider the following mitigations as temporary risk-reduction measures:
- Patch first: apply the fixed kernel package from your distribution as soon as feasible, and reboot into the patched kernel.
- Use vendor livepatch services where available: for distributions that support livepatching (Ubuntu Livepatch, SUSE Live Patching, Red Hat Kpatch, etc., check whether a livepatch exists and apply it to avoid immediate reboots in critical environments.
- Limit local untrusted access: reduce the number of accounts and services allowed to run untrusted code on hosts that manage RAID devices. Prevent untrusted containers or users from having access to the device-mapper APIs or /dev/md* nodes.
- Constrain container privileges: ensure containers cannot open or manipulate device-mapper nodes; avoid granting CAP_SYS_ADMIN or direct device access to untrusted containers.
- Isolate software RAID hosts: if possible, isolate hosts with md/dm-managed devices from multi-tenant or untrusted user workloads until patched.
- Monitor and respond: set up monitoring to detect unusual md/dm activity in kernel logs and automated alerts for kernel messages relating to sync thread starts/stops or md reconfiguration events.
Concrete step-by-step patching (general):
- Identify fixed kernel package for your distribution and release (consult distro advisory).
- Update package index and install the kernel package:
- For Debian/Ubuntu: apt update && apt upgrade linux-image-<pkg>
- For RHEL/CentOS: dnf update kernel
- For SUSE: zypper patch or zypper up kernel-default
- Reboot into the new kernel (plan reboots during maintenance windows).
- Verify patched kernel: uname -r and confirm package version matches vendor advisory.
- Monitor arrays post-reboot for normal sync behavior and health.
Operational and forensic considerations
- If you suspect exploitation or repeated hangs, collect volatile evidence before rebooting: dmesg, /proc/mdstat, output of ps and top, and kernel oops traces if present.
- Reboots can mask transient race conditions; preserve logs and enable persistent logging so you can correlate pre- and post-update behavior.
- For cloud or managed environments, coordinate with the cloud provider: host-level kernels may be controlled by the provider; ask for their remediation timeline if you cannot patch yourself.
Risk analysis — strengths and limitations of the patch
Strengths:
- The upstream change follows an established, safer pattern used elsewhere in the md subsystem (stop_sync_thread, reducing the risk of regressions.
- Distribution vendors have rapidly integrated fixes and published advisories, enabling system administrators to remediate with their usual package management workflows.
- The fix avoids the riskier alternative (blindly acquiring reconfig_mutex in the message path), which could have created deadlock bugs.
Limitations and residual risks:
- The vulnerability requires local access, so the risk profile is concentrated in multi-tenant and shared-host environments. However, in those environments local access is often easier to obtain than administrators expect — e.g., through untrusted containers or weak user isolation.
- Because the fix touches sync-thread lifecycle code, there is a chance of regressions in complex array states. Vendors have backported and tested, but administrators should still monitor array health after updating.
- There are distribution- and packaging-specific differences in how kernels are patched and backported; relying on generic upstream version numbers without checking vendor advisories can lead to a false sense of safety.
Cautionary note: at the time of this writing there is no authoritative public evidence of widespread active exploitation; nonetheless the nature of the bug — a race that can be triggered by local operations on RAID control interfaces — makes it a real operational risk where hosts expose the device-mapper or run untrusted workloads.
Practical checklist for sysadmins (executive summary)
- Immediately determine whether your hosts use md or dm-raid (cat /proc/mdstat, dmsetup ls).
- Map your running kernel to vendor advisories and confirm whether your kernel package is fixed.
- If fixed kernel packages are available, schedule and apply updates and reboot as required.
- Where reboots are problematic, check for vendor livepatch availability and apply livepatches as appropriate.
- Restrict local access and container privileges on hosts that manage software RAID until patched.
- Enable and review kernel logs for unusual md/dm activity; collect forensic data if you observe hangs or repeated md thread behavior.
- Document the update and monitoring actions for audit and incident response readiness.
Timeline and vendor response (what to expect)
- Discovery and upstream patching: upstream patch series to align dm-raid with md/raid (swap md_reap_sync_thread calls for stop_sync_thread were posted and merged into the kernel stable lines.
- Vendor advisories: major distributions (Ubuntu, SUSE, Red Hat, Debian-derived advisories) tracked the issue and published security notices listing package versions that contain the fix. Vendors typically backported the change into their stable kernel trees, and the fixes were made available via regular package updates.
- Suggested action: treat vendor advisories as the authoritative mapping from CVE to fixed package; when possible, apply the vendor-released patch rather than trying to backport or apply upstream patches manually to production kernels.
Final assessment and recommendations
CVE-2024-35808 is a synchronization bug in the md/dm-raid code path that affects kernel-managed software RAID and device-mapper RAID. The vulnerability’s primary impact is
availability — a local attacker can induce a denial-of-service against storage arrays. While not an immediate remote code-execution risk, the vulnerability is operationally significant in shared-host environments, cloud hosts and any systems where untrusted code can run locally.
Recommended actions, prioritized:
- Patch kernel packages from your vendor and reboot into the fixed kernel as soon as operationally practical.
- Use vendor livepatch services where available to avoid emergency reboots while still applying the fix.
- Harden hosts by removing unnecessary local access and restricting container capabilities until patched.
- Monitor kernel logs and md/dm-raid activity, and capture forensic data if you observe hangs or repeated sync-thread issues.
- Maintain a documented remediation timeline and verify patch deployment across all affected hosts.
This vulnerability underscores the importance of
locking discipline in kernel subsystems that manage critical I/O paths. The upstream fix’s approach — reusing stop_sync_thread — is pragmatic and consistent, and vendor patches have been made available; timely patching is the most effective way to remove the denial-of-service risk from production systems.
Conclusion
CVE-2024-35808 is a local-access denial-of-service vulnerability in Linux software RAID’s dm-raid driver caused by an unsafe direct call to md_reap_sync_thread. The issue has been fixed upstream by switching to stop_sync_thread, and distributions have released kernel updates. Administrators running software RAID or device-mapper RAID should prioritize applying vendor kernel updates or livepatches, reduce untrusted local access to device-mapper controls, and monitor md/dm logs closely until all sensitive hosts are patched.
Source: MSRC
Security Update Guide - Microsoft Security Response Center