CVE-2024-49885: SLUB Redzone Bug and Kernel Availability Risk Explained

  • Thread Author
A subtle mistake in the SLUB allocator’s handling of kmalloc redzones has been tracked as CVE-2024-49885: a kernel-level bug that can turn defensive memory-initialization into a self-inflicted availability failure. The defect is narrow and surgical in scope — it only appears when SLUB debugging and the init_on_free option interact — but its observable symptom is blunt: kernel BUG/OOPS messages and potential system instability (a denial-of-service) on affected kernels. This article breaks down what went wrong, why it matters to system operators and cloud hosts, which kernels and distributions are affected, what the upstream fix does, and pragmatic steps to detect, mitigate, and validate remediation in production environments.

Background / Overview​

The Linux kernel’s SLUB allocator includes several debugging features that help detect memory corruption early. One of those is the concept of a redzone — a deliberately poisoned region placed around allocated objects so that overruns are detected when the allocator checks the redzone pattern. The bug tracked as CVE-2024-49885 arises from an interaction between a SLUB change that extended redzone checks and the allocator’s behavior when configured to initialize freed memory (init_on_free). In short: under certain configurations the kernel ended up zeroing the redzone and clearing metadata, which then caused the redzone checks themselves to misinterpret object boundaries and report redzone overwrites, producing kernel BUG traces. The public vulnerability records summarize this root cause and the reproduction traces produced by SLUB debug checks.

The technical anatomy: SLUB, kmalloc redzones, and the orig_size mistake​

What SLUB’s redzone and orig_size are supposed to do​

  • SLUB redzones: When SLUB is built with debug options, kmalloc may allocate extra bytes beyond the requested size and mark the surplus as a redzone filled with a known poison pattern (commonly 0xCC). Later integrity checks compare those bytes to the poison; any deviation signals an overwrite.
  • orig_size: To support these checks, SLUB sometimes records an orig_size (the originally requested allocation size) separately from the allocator’s internal object_size. The allocator uses orig_size to know what portion of the slab was legitimately used by the caller and which part is the redzone/padding to be checked.
These mechanisms are defensive: they detect and report out-of-bounds writes in kernel code early during development or in hardened builds.

Exactly what went wrong​

A commit that extended redzone checks to extra allocated kmalloc space introduced logic where orig_size marked the unused tail (object_size - orig_size) as the redzone. But when CONFIG_SLUB_DEBUG and the runtime option init_on_free=1 were enabled, the allocator’s free-path attempted to zero the entire object->size (the full slab object size), not just the caller’s orig_size region. In doing so it also zeroed the object metadata area that stored orig_size — effectively setting orig_size to zero. With orig_size cleared, subsequent internal check_object calls saw the entire object as redzone and reported that the redzone had been overwritten; the kernel printed an explicit BUG trace (a redzone-overwritten message) and, depending on the environment, could produce an OOPS or panic. The NVD and public tracker descriptions lay out that exact sequence and include the diagnostic trace that reproduces the condition.

Why this isn’t a classic memory-corruption exploit (but still serious)​

This is not a traditional remote code-execution bug. The problem is triggered by allocator behavior while initializing freed memory and by SLUB debug features that are often enabled only on debug/hardened kernels. The vulnerability’s practical impact is availability-oriented: repeated triggering of this path can cause kernel BUGs and host instability — a reliable denial-of-service primitive for an attacker who already has local or guest code execution. Public trackers classify the impact as availability with a CVSS vector consistent with a local/privileged but low-scope attack, and they give a medium score (CVSS 3.1 base score ~5.5).

Affected kernels and where the bug was introduced / fixed​

  • The regression was introduced by commit 946fa0dbf2d8 (described as "mm/slub: extend redzone check to extra allocated kmalloc space than requested") in the v6.2-rc1 kernel series.
  • The upstream kernel fix was merged into the kernel trees and is associated with the v6.12 stabilization window (the fix is recorded as present in the mainline/stable commit stream that begins at v6.12-rc1 / v6.12). Public kernel CVE announcements list the commit ranges and note the bug is effectively present in kernels that include the introduced commit (i.e., kernels built from v6.2 up to but not including the v6.12 fixes).
Because distributions often backport fixes selectively, operators must not rely solely on kernel series numbers — they must check their vendor package changelogs to confirm the upstream fix or a vendor backport is present. Many distribution trackers and vendor advisories (Ubuntu, Red Hat, SUSE, Snyk, etc. list their own package mappings and fixed versions. For example, Ubuntu and other mainstream trackers include this CVE and list affected packages and remediation statuses.

Realistic attack model and risk profile​

Attack vector and prerequisites​

  • Local-only: An attacker needs the ability to run code on the target host (or inside a guest/container that can exercise the host kernel’s reclaim/free paths) to reliably trigger the condition. Remote, unauthenticated exploitation is not indicated by the public advisories.
  • Configuration-dependent: The most common reproduction traces appear when SLUB debug modes are active and init_on_free=1 is set. Those are non-default options on many production kernels; they’re more common in hardened or debug builds.
  • Multi-tenant relevance: The risk profile is highest on systems that host untrusted workloads — cloud hypervisors, CI runners, multi-tenant nodes, or systems where guest workloads can provoke reclaim/free behavior that touches kmalloc paths.

What an attacker can achieve​

  • The observable effect is denial of availability: kernel BUG messages, oopses, and potentially host crashes that lead to reboots or service interruptions.
  • There is no authoritative public evidence that this bug, by itself, yields privilege escalation or remote code execution at disclosure time. Attackers who already have local code execution can weaponize availability issues, and a reliable crash primitive is valuable in multi-tenant disruption campaigns. Public EPSS and exploitation telemetry were low, but operators should treat the defect as urgent where hosts run untrusted code.

The upstream fix: what changed in code and why it’s safe​

The corrective approach is surgical and straightforward:
  • When clearing memory on free, use orig_size (the originally requested size) to determine the region to zero, not the allocator’s larger object_size; this ensures the redzone bytes remain untouched.
  • After clearing the used portion, restore the stored orig_size metadata so subsequent check_object calls see the correct orig_size and treat only the intended trailing bytes as redzone.
  • When SLUB debugging is not enabled, orig_size falls back to object_size, which simplifies the init path.
The upstream maintainers implemented a small, conservative change focusing on the free-path zeroing logic and metadata preservation. The change is intentionally minimal to reduce regression risk and to make it easy for stable-branch backports. Multiple public advisories indicate the fix was accepted upstream and backported into stable kernels and vendor packages. Why this is a good fix: it preserves the security benefits of init_on_free for legitimate zeroing of in-use data while preventing the allocator from destroying the debug metadata that marks the redzone boundary. The patch does not remove redzones or disable checks — it simply ensures the allocator initializes memory only in the region actually claimed by the caller and keeps orig_size intact.

Detection: forensic signs and telemetry to hunt for​

This vulnerability produces clear kernel diagnostic output when triggered — a useful signal for detection and triage:
  • Look in kernel logs (dmesg / journalctl -k) for messages similar to:
  • "BUG kmalloc-8: kmalloc Redzone overwritten" (the exact slab name and offset will vary with object size and config).
  • The typical trace includes a displayed redzone and object bytes, a "Restoring kmalloc Redzone" message, and a backtrace that includes check_object and free_to_partial_list or other SLUB/kslab functions.
  • Example hunting commands:
  • journalctl -k | grep -i "kmalloc Redzone overwritten"
  • dmesg | egrep -i "Redzone|FIX kmalloc-"
  • Enable kernel crashdump/kdump if you need to preserve evidence after a host crash; OOPS traces are often ephemeral and lost on reboot.
The NVD and other public descriptions include the canonical trace; matching that trace in your environment is an effective indicator that you are encountering the same condition.

Mitigation and remediation: practical checklist​

  • Inventory and prioritization
  • Identify hosts running kernels built from the v6.2 commit series up to the v6.12 fix window (i.e., kernels that include the offending commit but lack the fix). Use uname -r and vendor package metadata to map kernels to advisories. Cross-check your distribution’s security tracker for package-level mappings.
  • Patch
  • Apply vendor-supplied kernel updates that explicitly list CVE-2024-49885 or the upstream stable commit. Many distribution advisories have already published fixed package versions; if your vendor has an advisory (Ubuntu, Red Hat, SUSE, etc., follow their recommended upgrade path.
  • Reboot
  • Kernel-level fixes require a reboot into the updated kernel; schedule maintenance windows accordingly.
  • If a vendor package is not available
  • Consider building a kernel with the upstream patch applied and validate in a staging environment before rolling out. This is a fallback only when vendor packagers haven’t yet issued updates.
  • Reduce exposure while you patch
  • Restrict which workloads can run untrusted code on vulnerable hosts (isolate untrusted tenants).
  • Avoid enabling SLUB debug options and init_on_free=1 on production hosts unless you intentionally run hardened/debug kernels — these options increase the likelihood of encountering the specific condition described here.
  • Post-patch validation
  • After patching and rebooting, re-run the workload or test-cases that previously triggered the issue (if available) and monitor kernel logs for recurring traces.
  • Validate vendor kernel changelogs and package metadata to confirm the upstream fix commit or vendor backport is present.
If you operate multi-tenant cloud or CI infrastructure, prioritize remediation accordingly: these environments provide the most attractive attack surface for a local/guest-triggered DoS primitive.

Strengths and residual risks — a critical security analysis​

Strengths of the upstream response
  • The fix is small and focused — it targets the zeroing and metadata-handling logic without changing redzone semantics. That makes it low-risk to backport and deploy across stable kernel trees.
  • The patch preserves the defensive intention of redzones and the init_on_free option; it simply prevents the allocator from self-defeating its own checks. Public maintainers and vendors have accepted this approach and published backports.
Residual risks and operational caveats
  • Vendor and OEM lag: embedded appliances, vendor-forked kernels, and long‑tail OEM devices commonly lag upstream fixes; these may remain vulnerable even after mainstream distributions are patched. Operators must verify each kernel artifact or device image rather than assuming uniform coverage.
  • Local attack vector remains: while the bug does not represent a remote RCE, it does provide a reliable crash primitive to actors who already have local code execution or guest access — a realistic threat model for shared-hosting environments.
  • Potential for mis-scoped scanning: automated scanners that only look for kernel major.minor versions can be misled when distributions backport fixes under different package names. Always map the upstream commit ID to your vendor’s package changelog or release notes for a definitive remediation check.
Flagging unverifiable claims
  • There is no public, authoritative proof of in-the-wild exploitation for CVE-2024-49885 at disclosure; telemetry and EPSS values indicate low observed exploitation. This absence of proof does not mean the bug is harmless — it simply indicates that, at publication, no confirmed exploitation campaign has been documented. Treat that absence conservatively: for shared hosts, prioritize remediation.

Operator playbook: short, medium, long-term actions​

  • Short-term (0–48 hours)
  • Inventory kernels: uname -r across fleet and map to vendor advisories.
  • Isolate highly exposed systems: limit who can deploy untrusted workloads to hosts that remain unpatched.
  • Enable kernel logging and preserve dmesg/kernel crash dumps for any suspicious resets or OOPSes.
  • Medium-term (48 hours–2 weeks)
  • Apply vendor kernel updates and schedule staggered reboots.
  • Validate that patched images contain the upstream fix (check package changelogs for CVE-2024-49885 or the upstream commit).
  • Run representative post-patch QA tests for workloads that previously affected memory allocation/free flows.
  • Long-term
  • Harden multi-tenant infrastructure: disallow unnecessary kernel debug options in production images unless you maintain a clear testbed separation.
  • Improve automated mapping between upstream commit IDs and vendor package changelogs to avoid ambiguity in future kernel CVE responses.
  • Maintain crashdump/kdump infrastructure to preserve forensic artifacts for kernel OOPSes.

Final takeaways​

CVE-2024-49885 is a textbook example of how defensive debugging features and memory-initialization options can interact with allocator metadata in subtle ways that create availability problems. The bug’s impact is not speculative — it generates reproducible kernel BUG traces — yet it is not a high-confidence remote code execution vector in the wild. The upstream fix is minimal, low-risk, and focused on preserving orig_size semantics while zeroing only what the caller used; that means remediation is straightforward for most operators: install vendor kernel updates and reboot.
Prioritization should be driven by exposure: cloud hypervisors, VM hosts, CI runners, and any environment that hosts untrusted workloads should treat this as a higher priority and validate vendor package mappings carefully. For single-user desktop systems with default production kernels (no SLUB debug and no init_on_free), the operational urgency is lower — but verification is still prudent because vendor kernels and custom builds vary.
Operators should confirm their specific kernel packages contain the upstream fix before declaring systems remediated, preserve kernel logs and crash dumps for any incidents, and follow a staged rollout strategy to minimize operational disruption while closing this availability gap.
Source: MSRC Security Update Guide - Microsoft Security Response Center