Linux fscache CVE-2024-45000 DoS: Kernel NULL Pointer Dereference Explained

  • Thread Author
A subtle race-condition bug in the Linux kernel’s fscache subsystem — tracked as CVE-2024-45000 — can allow the kernel to dereference a NULL pointer and crash, producing a denial-of-service condition on affected systems. The flaw stems from a missing check of the cookie access counter (the internal n_accesses field) during an LRU-discard transition in the fscache state machine; the upstream fix adds that check so a cookie is never withdrawn while active accesses remain. This is an availability-focused vulnerability: it does not directly disclose data or elevate privileges, but unpatched hosts that use fscache (commonly with network filesystems such as NFS) can be made unstable or completely unavailable under crafted or race-provoking workloads.

Background / Overview​

fscache is the Linux kernel subsystem that implements a persistent cache for network filesystems and remote content backends. It hands out small internal objects called cookies to represent cacheable resources; those cookies are referenced concurrently by many kernel threads and asynchronous workqueues. Proper lifecycle management of cookies depends on a counter — n_accesses — that tracks how many active users currently reference a cookie. If the kernel withdraws or tears down a cookie while one or more users still hold accesses, subsequent code paths can dereference cleared pointers and crash the kernel. The CVE-2024-45000 fix simply restores a missing defensive check in the LRU-discard path so the state machine will not proceed to discard while n_accesses is non‑zero.
Why this matters operationally: network caching is often used in high-throughput or highly concurrent environments (virtual desktop infrastructures, large NFS exports, caching appliances). The conditions needed to trigger the race are not ordinary single-threaded bugs — they require concurrency between the fscache state machine and cookie usage — but they are realistic under heavy load or aggressive cache-pressure operations. When the race occurs the kernel oops points to cachefiles_prepare_write and an attempt to read a small, fixed address (for example, address 0x8) typical of a NULL-plus-offset dereference. The vulnerability therefore maps squarely to an availability impact (kernel panic / reboot or stuck unresponsive hosts) rather than confidentiality or integrity.

What exactly went wrong: a technical walk-through​

The internal actors: cookies, n_accesses, and state transitions​

  • A cookie represents a cacheable object and contains:
  • a pointer to cache backend state (e.g., cachefiles private data)
  • a flags field including an LRU-discard marker
  • an atomic counter n_accesses tracking "how many users are inside this cookie now"
  • The fscache state machine drives cookie lifecycle changes (active → relinquishing → withdrawing, etc.).
  • Certain paths cause an LRU discard to be scheduled (FSCACHE_COOKIE_DO_LRU_DISCARD). The state machine must ensure a cookie isn't torn down while accesses remain.
The bug: the LRU-discard transition in the FSCACHE_COOKIE_STATE_ACTIVE case omitted a check of atomic_read(&cookie->n_accesses) before moving to the LRU‑discarding state. Consequently, when the discard and a concurrent access overlapped, the code could follow a path that cleared cache_priv while another CPU still accessed it — producing the NULL dereference. The upstream patch adds the missing n_accesses check and returns early when active accesses remain, ensuring the subsequent end_cookie_access() will requeue the state machine once accesses drop to zero and the discard can safely proceed.

Evidence in kernel oops traces​

Operators debugging the issue will typically see an oops trace with:
  • BUG: kernel NULL pointer dereference, address: 0000000000000008
  • a RIP pointing to cachefiles_prepare_write (or adjacent cachefiles functions)
  • call traces that show simultaneous workqueue and cookie end/access activity
Those symptoms are explicitly recorded in the vulnerability record and were used to reason that the underlying cause is a missing counter check rather than a straightforward memory corruption bug. Use of the oops terms and the named functions is a quick detection heuristic when scanning logs.

Affected kernels, CVSS and scope​

  • CVE designation: CVE-2024-45000 (public 4 September 2024).
  • Severity (most trackers): Medium. A representative CVSS v3 base score is 5.5 (AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H) — reflecting local attack vector, low complexity, local privileges, and pure availability impact.
  • Affected kernel trees and backports: multiple stable series were patched upstream; distribution maintainers backported the fix to their kernels. Commonly cited patched versions include the 6.1, 6.6 and 6.10 stable series (for example, the fix appears in 6.1.107+, 6.6.48+, and 6.10.7+ backports used by vendors). Administrators should treat any distribution kernel built from an earlier commit as potentially vulnerable until their vendor confirms the patch is present.
Cross-checking public advisories (NVD, Ubuntu, Amazon Linux/ALAS, OSV and vendor trackers) shows consistent public exposure and consistent patch messaging across vendors: upgrade the kernel to a patched release and reboot. Those advisories also reiterate the limited scope — no confidentiality or integrity loss is described — but the availability hit can be severe in production.

Exploitation: how realistic is abuse?​

  • Attack type: local denial-of-service (DoS) via race. An attacker or a buggy local process that repeatedly triggers fscache usage and eviction can increase the probability of the race surface. The exploitability is not remote: an attacker must have the ability to run code or actions locally (or be able to provoke workload combinations within a shared kernel environment, for example in some multi-tenant virtualization setups).
  • PoC status: public trackers and vulnerability feeds indicate no widely circulated remote PoC and no confirmed reports of active exploitation in the wild at the time the advisories were collected. That said, local DoS conditions have been reproduced in lab scenarios by carefully orchestrating fscache accesses and forced LRU operations. Administrators of shared or untrusted-host environments should treat the risk as real.
  • Practical trigger conditions: heavy concurrent access to cached entries, combined with LRU pressure or cache eviction events that flip the discard flag at the wrong time. Operators can artificially increase probability by:
  • aggressive, parallel reads/writes of many cached objects
  • invoking cache eviction (explicit or via resource starvation)
  • forcing asynchronous workqueue scheduling races (for research/adversarial testing only)
In short: the vulnerability is realistic for local or multi-tenant attack models and reproducible with sufficient concurrency and cache churn; it is not an easy remote RCE.

Detection, logging and forensics guidance​

Logs and kernel oops traces​

  • Search dmesg and journal logs for kernel oops patterns with:
  • cachefiles_prepare_write or similar cachefiles tracepoints
  • NULL pointer dereference, address: 0000000000000008
  • Correlate oops timestamps with high concurrency workloads or cache eviction events.
These oops traces are the most direct evidence that the race hit the kernel. The NVD entry and multiple distro advisories reproduce example oops output — use that as a signature to detect affected systems.

Runtime detection​

  • Monitor for repeated kernel panics, reboots, or host unresponsiveness originating from hosts that mount NFS or use other fscache-backed network filesystems.
  • On systems with systemd, audit the journal for repeated kernel messages and map them against workloads that touch network mounts.

Forensics​

  • If a host crashed and rebooted, capture the kernel oops, copy /var/log/kern.log and journal output, and preserve the workload logs that coincide with the crash window (application logs, NFS client logs).
  • Recreate the conditions in a controlled lab environment with the same kernel and cache workload to confirm the root cause before remedial actions beyond patching.

Mitigation and remediation (what admins must do now)​

Short answer: patch the kernel and reboot. The fix is small, upstream, and widely backported; vendors have released updated kernels or advisories for their distributions. If you cannot immediately patch, consider temporary mitigations that reduce the chance of the race (but note: there is no foolproof workaround that eliminates the underlying logic error).
Concrete steps:
  • Inventory:
  • Identify hosts that use fscache or that mount network filesystems that rely on fscache (NFS, AFS, cachefilesd-backed systems).
  • Gather kernel versions: uname -r and distribution kernel package details.
  • Check vendor advisories:
  • Confirm whether your distribution has published a security advisory that backports the patch.
  • Distro trackers (Ubuntu, Amazon Linux, Red Hat/EL) show fixed package names and kernel versions; follow vendor guidance.
  • Upgrade:
  • Apply the vendor-provided kernel update containing the fscache fix and reboot the host.
  • If you maintain custom kernels, merge the upstream commit(s) or rebase to a stable kernel that includes the patch, then rebuild and deploy.
  • Temporary risk-reduction (if patching is delayed):
  • Reduce cache churn where possible: avoid aggressive background cache trimming or immediate heavy concurrent file churn on NFS clients.
  • Where practical, move critical workloads to hosts that do not enable fscache until those hosts are patched.
  • For multi-tenant hypervisors, consider isolating potentially untrusted guests until hosts are updated.
  • Post-patch validation:
  • After updating, reproduce representative workloads; verify that previous oops patterns are no longer triggered under stress testing.
Vendor advisories and open-source trackers uniformly recommend updating kernels and rebooting as the primary remediation. Several trackers list the patched stable series and the CVSS/impact.

Patch details and verification​

The upstream fix is a small, targeted defensive check inserted in the fscache_cookie_state_machine() logic that prevents the LRU-discard path from proceeding while n_accesses is non-zero. Multiple stable-tree commits and distribution backports were merged and published following the public disclosure in September 2024. Administrators who wish to verify the presence of the fix in a kernel can:
  • Inspect the kernel source tree used to build the running kernel for the fscache_cookie_state_machine() change that adds an atomic_read(&cookie->n_accesses) check in the FSCACHE_COOKIE_STATE_ACTIVE/LRU discard branch.
  • Confirm the distribution package changelog or security advisory lists CVE-2024-45000 or references the fscache cookie n_accesses check.
Because the fix is intentionally conservative and specifically targets the race window (rather than rearchitecting fscache), the risk of regressions is low — but standard practice is to test updated kernels in a staging environment before mass deployment. The kernel community and distributions have published patch references and backport notices; vendors’ package changelogs are the authoritative confirmation that your build contains the fix.

Operational impact and risk analysis​

  • Short-term impact: Hosts that run unpatched kernels and that use fscache can suffer unpredictable kernel oops/panics under specific concurrency conditions. That may cause service interruption for the workloads they host. In clustered storage or shared compute environments, an attacker or misbehaving process could repeatedly trigger a denial-of-service across multiple machines.
  • Attack surface model: Because the bug requires local action or the ability to provoke workloads that trigger cache churn and evictions, it is a local DoS. Cloud and hosting providers should pay particular attention when a tenant could induce the race on a shared host or when the host executes third-party workload code.
  • Likelihood of exploitation: Medium in environments where fscache is actively used and hosts see heavy concurrent file activity. Low in typical desktops or servers that do not enable fscache or that use modest caching workloads.
  • Potential for chained attacks: The vulnerability is not known to yield code execution or data leaks by itself. However, a persistent DoS in a critical server can enable adjacent risks such as failover storms, degraded visibility, or operational errors during incident response — so treat denial-of-service as a high-priority operational risk despite the medium CVSS score.

Recommended remediation timeline and checklist​

  • Immediately (within 24–72 hours)
  • Identify hosts using fscache or cachefiles-backed mounts.
  • Check vendor advisories and obtain fixed kernel packages if available.
  • Short-term (1 week)
  • Schedule kernel updates and reboots for production windows; prioritize multi-tenant and high-availability hosts.
  • For hosts where immediate reboot is impossible, isolate or migrate workloads off the host if feasible.
  • Medium-term (2–4 weeks)
  • Verify updated hosts under representative load; ensure no regression in filesystem or caching performance.
  • Tune monitoring and alerting for kernel oops traces and for repeated reboots related to cachefiles activity.
  • Long-term
  • Review fscache usage across your estate: if the subsystem is unnecessary, consider disabling it in kernel builds where that’s practical.
  • Incorporate kernel security backport tracking into patch management to avoid similar long-tail exposure.

Final analysis: strengths and residual risks​

The positive: the fix is small, surgical, and merged upstream promptly; distro vendors have been able to backport and ship the patch to users. The root cause was an omitted concurrency check — a classically fixable defect — and upstream reviewers addressed it without major API changes or compatibility tradeoffs. Multiple independent trackers (NVD, distribution security pages, OSV, vendor advisories) converge on the same technical explanation and remediation path, increasing confidence in the analysis and the available fixes.
The residual risk: even after patching, multi-tenant operators must recognize that other fscache race windows might exist, and the presence of a patched kernel in the wild does not guarantee all upstream fixes or backports are present in every vendor image. Similarly, environments that delay kernel updates (long‑tail vendor kernels, embedded appliances, third‑party vendor images) remain exposed. Detection is straightforward when kernel oops messages show the characteristic traces, but some operational incidents may be attributed to other causes unless logs are retained and correlated. Finally, because the bug is a local or workload-triggered DoS, adversaries with local access or the ability to create pathological workloads retain a practical attack avenue until all hosts are patched.

Key takeaways (quick reference)​

  • CVE-2024-45000 is a Linux kernel race condition in fs/netfs/fscache_cookie that results from a missing n_accesses check and can produce a kernel NULL pointer dereference and DoS.
  • The vulnerability is local in nature (requires local action or workload control) and has a reported CVSS v3 score around 5.5 (Medium), with the impact category: availability.
  • Immediate action: update the kernel to a vendor-patched release and reboot; confirm the distribution's advisory and kernel changelog before declaring a host remediated.
  • Detection: scan logs for kernel oops referencing cachefiles_prepare_write and NULL pointer dereference traces; correlate with heavy fscache usage.

CVE-2024-45000 is a textbook example of how a tiny missing defensive check in concurrent kernel code can cascade into real-world availability failures. The remedy is straightforward and widely available; administrators should treat this as a moderate‑priority kernel update for affected hosts and incorporate the lessons — stricter concurrency checks, robust staging tests under stress, and disciplined kernel patch management — into their long-term platform hardening plans.

Source: MSRC Security Update Guide - Microsoft Security Response Center