A subtle race-condition bug in the Linux kernel’s fscache subsystem — tracked as CVE-2024-45000 — can allow the kernel to dereference a NULL pointer and crash, producing a denial-of-service condition on affected systems. The flaw stems from a missing check of the cookie access counter (the internal n_accesses field) during an LRU-discard transition in the fscache state machine; the upstream fix adds that check so a cookie is never withdrawn while active accesses remain. This is an availability-focused vulnerability: it does not directly disclose data or elevate privileges, but unpatched hosts that use fscache (commonly with network filesystems such as NFS) can be made unstable or completely unavailable under crafted or race-provoking workloads.
fscache is the Linux kernel subsystem that implements a persistent cache for network filesystems and remote content backends. It hands out small internal objects called cookies to represent cacheable resources; those cookies are referenced concurrently by many kernel threads and asynchronous workqueues. Proper lifecycle management of cookies depends on a counter — n_accesses — that tracks how many active users currently reference a cookie. If the kernel withdraws or tears down a cookie while one or more users still hold accesses, subsequent code paths can dereference cleared pointers and crash the kernel. The CVE-2024-45000 fix simply restores a missing defensive check in the LRU-discard path so the state machine will not proceed to discard while n_accesses is non‑zero.
Why this matters operationally: network caching is often used in high-throughput or highly concurrent environments (virtual desktop infrastructures, large NFS exports, caching appliances). The conditions needed to trigger the race are not ordinary single-threaded bugs — they require concurrency between the fscache state machine and cookie usage — but they are realistic under heavy load or aggressive cache-pressure operations. When the race occurs the kernel oops points to cachefiles_prepare_write and an attempt to read a small, fixed address (for example, address 0x8) typical of a NULL-plus-offset dereference. The vulnerability therefore maps squarely to an availability impact (kernel panic / reboot or stuck unresponsive hosts) rather than confidentiality or integrity.
Concrete steps:
The residual risk: even after patching, multi-tenant operators must recognize that other fscache race windows might exist, and the presence of a patched kernel in the wild does not guarantee all upstream fixes or backports are present in every vendor image. Similarly, environments that delay kernel updates (long‑tail vendor kernels, embedded appliances, third‑party vendor images) remain exposed. Detection is straightforward when kernel oops messages show the characteristic traces, but some operational incidents may be attributed to other causes unless logs are retained and correlated. Finally, because the bug is a local or workload-triggered DoS, adversaries with local access or the ability to create pathological workloads retain a practical attack avenue until all hosts are patched.
CVE-2024-45000 is a textbook example of how a tiny missing defensive check in concurrent kernel code can cascade into real-world availability failures. The remedy is straightforward and widely available; administrators should treat this as a moderate‑priority kernel update for affected hosts and incorporate the lessons — stricter concurrency checks, robust staging tests under stress, and disciplined kernel patch management — into their long-term platform hardening plans.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
fscache is the Linux kernel subsystem that implements a persistent cache for network filesystems and remote content backends. It hands out small internal objects called cookies to represent cacheable resources; those cookies are referenced concurrently by many kernel threads and asynchronous workqueues. Proper lifecycle management of cookies depends on a counter — n_accesses — that tracks how many active users currently reference a cookie. If the kernel withdraws or tears down a cookie while one or more users still hold accesses, subsequent code paths can dereference cleared pointers and crash the kernel. The CVE-2024-45000 fix simply restores a missing defensive check in the LRU-discard path so the state machine will not proceed to discard while n_accesses is non‑zero.Why this matters operationally: network caching is often used in high-throughput or highly concurrent environments (virtual desktop infrastructures, large NFS exports, caching appliances). The conditions needed to trigger the race are not ordinary single-threaded bugs — they require concurrency between the fscache state machine and cookie usage — but they are realistic under heavy load or aggressive cache-pressure operations. When the race occurs the kernel oops points to cachefiles_prepare_write and an attempt to read a small, fixed address (for example, address 0x8) typical of a NULL-plus-offset dereference. The vulnerability therefore maps squarely to an availability impact (kernel panic / reboot or stuck unresponsive hosts) rather than confidentiality or integrity.
What exactly went wrong: a technical walk-through
The internal actors: cookies, n_accesses, and state transitions
- A cookie represents a cacheable object and contains:
- a pointer to cache backend state (e.g., cachefiles private data)
- a flags field including an LRU-discard marker
- an atomic counter n_accesses tracking "how many users are inside this cookie now"
- The fscache state machine drives cookie lifecycle changes (active → relinquishing → withdrawing, etc.).
- Certain paths cause an LRU discard to be scheduled (FSCACHE_COOKIE_DO_LRU_DISCARD). The state machine must ensure a cookie isn't torn down while accesses remain.
FSCACHE_COOKIE_STATE_ACTIVE case omitted a check of atomic_read(&cookie->n_accesses) before moving to the LRU‑discarding state. Consequently, when the discard and a concurrent access overlapped, the code could follow a path that cleared cache_priv while another CPU still accessed it — producing the NULL dereference. The upstream patch adds the missing n_accesses check and returns early when active accesses remain, ensuring the subsequent end_cookie_access() will requeue the state machine once accesses drop to zero and the discard can safely proceed.Evidence in kernel oops traces
Operators debugging the issue will typically see an oops trace with:BUG: kernel NULL pointer dereference, address: 0000000000000008- a RIP pointing to
cachefiles_prepare_write(or adjacent cachefiles functions) - call traces that show simultaneous workqueue and cookie end/access activity
Affected kernels, CVSS and scope
- CVE designation: CVE-2024-45000 (public 4 September 2024).
- Severity (most trackers): Medium. A representative CVSS v3 base score is 5.5 (AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H) — reflecting local attack vector, low complexity, local privileges, and pure availability impact.
- Affected kernel trees and backports: multiple stable series were patched upstream; distribution maintainers backported the fix to their kernels. Commonly cited patched versions include the 6.1, 6.6 and 6.10 stable series (for example, the fix appears in 6.1.107+, 6.6.48+, and 6.10.7+ backports used by vendors). Administrators should treat any distribution kernel built from an earlier commit as potentially vulnerable until their vendor confirms the patch is present.
Exploitation: how realistic is abuse?
- Attack type: local denial-of-service (DoS) via race. An attacker or a buggy local process that repeatedly triggers fscache usage and eviction can increase the probability of the race surface. The exploitability is not remote: an attacker must have the ability to run code or actions locally (or be able to provoke workload combinations within a shared kernel environment, for example in some multi-tenant virtualization setups).
- PoC status: public trackers and vulnerability feeds indicate no widely circulated remote PoC and no confirmed reports of active exploitation in the wild at the time the advisories were collected. That said, local DoS conditions have been reproduced in lab scenarios by carefully orchestrating fscache accesses and forced LRU operations. Administrators of shared or untrusted-host environments should treat the risk as real.
- Practical trigger conditions: heavy concurrent access to cached entries, combined with LRU pressure or cache eviction events that flip the discard flag at the wrong time. Operators can artificially increase probability by:
- aggressive, parallel reads/writes of many cached objects
- invoking cache eviction (explicit or via resource starvation)
- forcing asynchronous workqueue scheduling races (for research/adversarial testing only)
Detection, logging and forensics guidance
Logs and kernel oops traces
- Search
dmesgand journal logs for kernel oops patterns with: cachefiles_prepare_writeor similar cachefiles tracepointsNULL pointer dereference, address: 0000000000000008- Correlate oops timestamps with high concurrency workloads or cache eviction events.
Runtime detection
- Monitor for repeated kernel panics, reboots, or host unresponsiveness originating from hosts that mount NFS or use other fscache-backed network filesystems.
- On systems with systemd, audit the journal for repeated kernel messages and map them against workloads that touch network mounts.
Forensics
- If a host crashed and rebooted, capture the kernel oops, copy
/var/log/kern.logand journal output, and preserve the workload logs that coincide with the crash window (application logs, NFS client logs). - Recreate the conditions in a controlled lab environment with the same kernel and cache workload to confirm the root cause before remedial actions beyond patching.
Mitigation and remediation (what admins must do now)
Short answer: patch the kernel and reboot. The fix is small, upstream, and widely backported; vendors have released updated kernels or advisories for their distributions. If you cannot immediately patch, consider temporary mitigations that reduce the chance of the race (but note: there is no foolproof workaround that eliminates the underlying logic error).Concrete steps:
- Inventory:
- Identify hosts that use fscache or that mount network filesystems that rely on fscache (NFS, AFS, cachefilesd-backed systems).
- Gather kernel versions:
uname -rand distribution kernel package details. - Check vendor advisories:
- Confirm whether your distribution has published a security advisory that backports the patch.
- Distro trackers (Ubuntu, Amazon Linux, Red Hat/EL) show fixed package names and kernel versions; follow vendor guidance.
- Upgrade:
- Apply the vendor-provided kernel update containing the fscache fix and reboot the host.
- If you maintain custom kernels, merge the upstream commit(s) or rebase to a stable kernel that includes the patch, then rebuild and deploy.
- Temporary risk-reduction (if patching is delayed):
- Reduce cache churn where possible: avoid aggressive background cache trimming or immediate heavy concurrent file churn on NFS clients.
- Where practical, move critical workloads to hosts that do not enable fscache until those hosts are patched.
- For multi-tenant hypervisors, consider isolating potentially untrusted guests until hosts are updated.
- Post-patch validation:
- After updating, reproduce representative workloads; verify that previous oops patterns are no longer triggered under stress testing.
Patch details and verification
The upstream fix is a small, targeted defensive check inserted in thefscache_cookie_state_machine() logic that prevents the LRU-discard path from proceeding while n_accesses is non-zero. Multiple stable-tree commits and distribution backports were merged and published following the public disclosure in September 2024. Administrators who wish to verify the presence of the fix in a kernel can:- Inspect the kernel source tree used to build the running kernel for the
fscache_cookie_state_machine()change that adds anatomic_read(&cookie->n_accesses)check in theFSCACHE_COOKIE_STATE_ACTIVE/LRU discard branch. - Confirm the distribution package changelog or security advisory lists CVE-2024-45000 or references the fscache cookie
n_accessescheck.
Operational impact and risk analysis
- Short-term impact: Hosts that run unpatched kernels and that use fscache can suffer unpredictable kernel oops/panics under specific concurrency conditions. That may cause service interruption for the workloads they host. In clustered storage or shared compute environments, an attacker or misbehaving process could repeatedly trigger a denial-of-service across multiple machines.
- Attack surface model: Because the bug requires local action or the ability to provoke workloads that trigger cache churn and evictions, it is a local DoS. Cloud and hosting providers should pay particular attention when a tenant could induce the race on a shared host or when the host executes third-party workload code.
- Likelihood of exploitation: Medium in environments where fscache is actively used and hosts see heavy concurrent file activity. Low in typical desktops or servers that do not enable fscache or that use modest caching workloads.
- Potential for chained attacks: The vulnerability is not known to yield code execution or data leaks by itself. However, a persistent DoS in a critical server can enable adjacent risks such as failover storms, degraded visibility, or operational errors during incident response — so treat denial-of-service as a high-priority operational risk despite the medium CVSS score.
Recommended remediation timeline and checklist
- Immediately (within 24–72 hours)
- Identify hosts using fscache or cachefiles-backed mounts.
- Check vendor advisories and obtain fixed kernel packages if available.
- Short-term (1 week)
- Schedule kernel updates and reboots for production windows; prioritize multi-tenant and high-availability hosts.
- For hosts where immediate reboot is impossible, isolate or migrate workloads off the host if feasible.
- Medium-term (2–4 weeks)
- Verify updated hosts under representative load; ensure no regression in filesystem or caching performance.
- Tune monitoring and alerting for kernel oops traces and for repeated reboots related to cachefiles activity.
- Long-term
- Review fscache usage across your estate: if the subsystem is unnecessary, consider disabling it in kernel builds where that’s practical.
- Incorporate kernel security backport tracking into patch management to avoid similar long-tail exposure.
Final analysis: strengths and residual risks
The positive: the fix is small, surgical, and merged upstream promptly; distro vendors have been able to backport and ship the patch to users. The root cause was an omitted concurrency check — a classically fixable defect — and upstream reviewers addressed it without major API changes or compatibility tradeoffs. Multiple independent trackers (NVD, distribution security pages, OSV, vendor advisories) converge on the same technical explanation and remediation path, increasing confidence in the analysis and the available fixes.The residual risk: even after patching, multi-tenant operators must recognize that other fscache race windows might exist, and the presence of a patched kernel in the wild does not guarantee all upstream fixes or backports are present in every vendor image. Similarly, environments that delay kernel updates (long‑tail vendor kernels, embedded appliances, third‑party vendor images) remain exposed. Detection is straightforward when kernel oops messages show the characteristic traces, but some operational incidents may be attributed to other causes unless logs are retained and correlated. Finally, because the bug is a local or workload-triggered DoS, adversaries with local access or the ability to create pathological workloads retain a practical attack avenue until all hosts are patched.
Key takeaways (quick reference)
- CVE-2024-45000 is a Linux kernel race condition in fs/netfs/fscache_cookie that results from a missing n_accesses check and can produce a kernel NULL pointer dereference and DoS.
- The vulnerability is local in nature (requires local action or workload control) and has a reported CVSS v3 score around 5.5 (Medium), with the impact category: availability.
- Immediate action: update the kernel to a vendor-patched release and reboot; confirm the distribution's advisory and kernel changelog before declaring a host remediated.
- Detection: scan logs for kernel oops referencing
cachefiles_prepare_writeand NULL pointer dereference traces; correlate with heavy fscache usage.
CVE-2024-45000 is a textbook example of how a tiny missing defensive check in concurrent kernel code can cascade into real-world availability failures. The remedy is straightforward and widely available; administrators should treat this as a moderate‑priority kernel update for affected hosts and incorporate the lessons — stricter concurrency checks, robust staging tests under stress, and disciplined kernel patch management — into their long-term platform hardening plans.
Source: MSRC Security Update Guide - Microsoft Security Response Center