The Linux kernel received a targeted fix that prevents the GFS2 filesystem from triggering
recursive memory reclaim through page-cache allocations — a change recorded as CVE-2025-68356 and intended to remove a subtle but real deadlock and stack-exhaustion hazard when GFS2 code allocates page cache objects while already inside filesystem transaction paths.
Background / Overview
GFS2 (Global File System 2) is a clustered filesystem used in environments where multiple nodes share the same block device and require coordinated metadata and data locking. Because GFS2 integrates deeply with kernel page cache and transaction semantics, unexpected re-entry of memory reclaim into filesystem code creates a risky class of failures: stack exhaustion, livelocks and
deadlocks when reclaim attempts re-enter the filesystem and start a new transaction while the original transaction is still active.
CVE-2025-68356 addresses exactly this class of problem. In short, a call path that created new inodes returned an inode mapping whose allocation flags permitted filesystem-driven memory reclaim (the
GFP_FS bit). That configuration allowed memory reclaim to recurse into filesystem code (for example via filemap_grab_folio, which in turn could allocate again and start another transaction — a recipe for deadlock. The NVD describes the underlying cause and the corrective principle: ensure inode address-space gfp_mask values do not include GFP_FS so that page-cache allocations do not recurse into filesystem reclaim paths. Multiple independent vulnerability databases and downstream trackers imported the same description and the upstream stable commits that implement the fix. These include OSV and SUSE’s CVE tracker, each mirroring the kernel-team narration of the problem and the remedial approach.
Why this matters: deadlocks, stack consumption, and real-world triggers
The technical hazard in plain language
- When kernel code must free memory it can invoke memory reclaim workers. Memory reclaim paths can be complex and can cause other kernel subsystems to allocate memory while reclaiming.
- If a filesystem callback or helper runs inside a transaction and calls into the page cache in a way that can allocate, and that allocation triggers memory reclaim, the reclaim code may attempt to re-enter the same filesystem code that’s already holding transaction context.
- Re-entering transaction-handling code while a first transaction hasn't completed can easily produce deadlocks (two pieces of code waiting on each other) or exhaust limited kernel stack capacity by nested reclaim/transaction sequences.
This is not a hypothetical: the kernel’s own test suites and fuzzers have practical reproductions where xfstest/generic cases (notably generic/273 per the kernel commit notes) tripped similar reclaim recursion and stack-pressure issues, motivating defensive changes across filesystems. The kernel-team remedy for GFS2 follows that same defensive pattern.
Practical triggers and exposed populations
Hosts most likely to encounter this problem include:
- Clustered servers and virtual appliances that actually mount and use GFS2.
- Multi-tenant or shared-storage platforms where tenants can create workloads that perform unusual page-cache and transaction interactions.
- Long-running nodes where intermittent reclaim recursion could accumulate or cause intermittent deadlocks that are difficult to correlate to userwork.
Desktop systems or servers that do not build/load GFS2 are not in scope, but any distribution or cloud image that ships a kernel with GFS2 enabled (either built-in or as a module) should be inventoried and treated as potentially affected until the fix is shown present. Vendor kernel builds and cloud-targeted kernels (for example, Azure-targeted kernels) are a logical distribution surface to verify. Public vendor advisories routinely call out cloud-targeted kernel builds where filesystem modules are included.
What went wrong — the precise technical root cause
The allocation flag misconfiguration
- The vulnerable behavior arose because the kernel function that constructs new inodes set the inode mapping’s gfp_mask to GFP_HIGHUSER_MOVABLE.
- GFP_HIGHUSER_MOVABLE (at the time of the report) included the __GFP_FS flag. __GFP_FS allows allocator paths to call into filesystem helpers during allocation — in effect permitting reclaim and filesystem re-entry.
- That means a seemingly benign page-cache allocation could cause the memory allocator to run a reclaim path that calls into filemap_grab_folio and related code paths inside the same filesystem stack frame executing a transaction, leading to re-entrant transaction start and potential deadlock.
The fix approach avoids that re-entry by changing the gfp_mask for the inode address spaces so that __GFP_FS is not set there; this prevents reclaim from entering filesystem context during those allocations. The kernel upstream rationale emphasizes making inode address-space GFP flags conservative with respect to reclaim re-entry, while avoiding overly restrictive flags that would reduce performance or break legitimate allocation semantics.
Why stack exhaustion is a real risk (and not only theoretical)
Recursive reclaim consumes additional stack frames quickly: memory reclaim and filesystem code paths are non-trivial and invoke multiple helpers, increasing stack depth. Kernel stack space is limited; repeated recursion can overflow kernel stack or produce deep call chains that make deadlocks and corruption more likely. The kernel team cited this as a concrete reason to block recursive reclaim into filesystem code for inode mappings.
What was changed upstream (the fix and its intent)
- The upstream kernel patches ensure inode address spaces’ i_mapping->gfp_mask values do not include __GFP_FS, removing the allocator permission to run filesystem reclaim during page-cache allocations for those address spaces.
- The “meta” and resource-group address spaces had already used GFP_NOFS (explicitly avoiding __GFP_FS). To avoid overconstraining the allocator, the change uses the default gfp mask but removes the __GFP_FS bit specifically for the inode address spaces that previously had GFP_HIGHUSER_MOVABLE set. This tries to strike a balance between safety and avoiding unnecessary allocation restrictions.
- The changes were implemented as small, targeted edits and were merged into stable kernel trees; the kernel commit messages reference mitigation of xfstest/generic/273 and describe the change as loosely inspired by prior xfs defensive work on page cache allocation safety.
Those upstream commits are referenced by vulnerability databases and vendor trackers as the authoritative remediation artifacts. Note: in some environments the git.kernel.org stable commit pages may be blocked or return restricted status, but multiple independent trackers cite the same commit IDs and diffs, providing cross-verification.
Severity, exploitability, and realistic risk model
Impact class
- Primary impact: availability. The bug can cause kernel deadlock or stack exhaustion leading to hung systems or kernel oops/panic.
- Privilege requirements: Local. An attacker or process must be able to trigger the relevant GFS2 transactions and page-cache allocation patterns (typical on systems that mount and operate GFS2). In multi-tenant setups where untrusted code can influence the shared storage stack, this local constraint is easier to satisfy.
- Remote exploitation: Not indicated. The vulnerability is not a remote code execution primitive by itself. However, a local denial-of-service on critical hosts can have service-wide effects in shared/cloud environments.
Likelihood and real-world threat
- The vulnerability is concrete and reproducible in controlled tests (kernel test suites and fuzzers reported related failures). That raises confidence in the diagnosis and supports treating the issue as operationally relevant.
- There was no authoritative public proof-of-concept for privilege escalation or remote compromise at disclosure. The consensus guidance treats this as a DoS/deadlock vector requiring prompt patching for exposed hosts.
Detection and hunting: how to find affected systems and evidence of the bug firing
Inventory checks (identify potentially affected hosts)
- Check whether the running kernel includes GFS2 support:
- Inspect kernel config: grep -i gfs2 /boot/config-$(uname -r) || zcat /proc/config.gz | grep -i gfs2
- Check loaded modules: lsmod | grep gfs2
- Check for mounted GFS2 filesystems: grep gfs2 /proc/mounts or cat /proc/filesystems
- Map kernel package versions to vendor advisories and patch lists:
- Review your distribution’s security tracker and kernel package changelogs to verify whether the stable commit(s) implementing the fix are present in the kernel package you run. Vendor advisories and OSV/NVD entries list upstream commit IDs for cross-verification.
Operational signals that the defect was hit
- Kernel oops/panic traces that show stack frames entering gfs2 transaction functions with nested filemap or page-cache allocation frames.
- Hang symptoms or tasks stuck in reclaim/workqueue code around transactions or filemap_grab_folio sequences.
- Repro runs of failing xfstest/generic cases that test corner-case page cache allocations and reclaim behavior (the upstream discussion references xfstest/generic/273 as related test coverage).
If a crash occurs, capture kdump/vmcore and full kernel logs before rebooting — those artifacts are critical to map traces to the CVE and support vendor escalations.
Remediation and mitigation: a practical playbook
Immediate triage (short checklist)
- Inventory and prioritize hosts that mount GFS2 or have the module built/installed.
- Check vendor advisories for fixed kernel package versions that include the upstream stable-commit backport.
- Plan and schedule kernel package updates and reboots (or evaluate livepatch availability from your vendor), prioritizing multi-tenant/shared-storage hosts and cluster nodes.
Patching guidance
- Apply the vendor-provided kernel update that includes the upstream stable commit IDs referenced in NVD/OSV. Because this is a kernel memory/allocation semantics change, a reboot is typically required to fully remediate.
- If vendor livepatch is available for your kernel version, confirm the livepatch explicitly contains the gfs2 inode gfp_mask change before relying on it in production.
Post-patch verification
- After rebooting into the patched kernel, re-run the inventory checks (lsmod, /proc/mounts, kernel config) and validate kernel package changelogs or running image metadata contains the upstream commit IDs mentioned in vendor advisories.
- Monitor kernel logs for absence of repeat oops traces and for successful completion of GFS2 transactions under typical workloads.
Compensating controls (if immediate patching is impossible)
- Restrict who can mount or present shared block devices on critical hosts.
- Temporarily avoid operations that stress GFS2 transaction internals (heavy metadata operations, unusual page-cache stress tests) until patched.
- Isolate high-risk workloads that could cause pathological reclaim behavior onto patched or segregated nodes.
Vendor advisories will be the authoritative package-to-commit mapping; operators should always match the vendor kernel package changelog to upstream commit IDs listed in vulnerability trackers before declaring remediation complete.
Why the upstream change is sensible — strengths and residual risks
Strengths
- The change is surgical: it removes the __GFP_FS permission from a specific address-space gfp_mask rather than broadly disabling reclaim functionality or radically altering allocation APIs. That minimizes regression risk.
- It follows an established defensive pattern used in other filesystems (XFS and others) to avoid page-cache allocation recursing into filesystem reclaim paths.
- The patch is straightforward to review and backport to stable branches, which accelerates distribution propagation.
Residual risk and caveats
- Distribution lag: as with any kernel fix, stable backports across all vendors and OEM kernels take time. Embedded appliances and vendor-supplied kernels are the longest tail.
- Incomplete coverage: the fix targets inode address spaces where this pattern was observed, but analogous code paths in other filesystems or in other mapping types could potentially have similar flags; operators should watch vendor trackers for related follow-ups.
- Livepatch availability varies by vendor and kernel version; an absence of a livepatch means reboots are needed for full remediation.
- The NVD record did not (as of initial publication) list an assigned CVSS score; operators should rely on operational exposure rather than a single numeric severity when prioritizing patching.
Verification commands and operator checklist (actionable)
- Inventory GFS2 presence:
- grep -i gfs2 /boot/config-$(uname -r) || (zcat /proc/config.gz 2>/dev/null || true) | grep -i gfs2
- lsmod | grep -i gfs2
- grep gfs2 /proc/mounts || cat /proc/filesystems | grep gfs2
- Find kernel package changelog / vendor advisory:
- On Debian/Ubuntu: apt changelog linux-image-$(uname -r) or check the distribution’s security tracker for the CVE mapping.
- On RHEL/SUSE: consult the vendor CVE advisories and package metadata (rpm -q --changelog kernel-package-name) and match upstream commits.
- After update:
- Reboot into updated kernel.
- Confirm uname -r matches the patched kernel package.
- Check dmesg/journalctl for absence of prior traces and for clean GFS2 mounts.
- Preserve evidence on active incidents:
- Capture dmesg and kdump/vmcore before rebooting.
- Collect syslogs and any reproducible test cases that reproduce the deadlock/hang.
Timeline and disclosure status
- NVD entry for CVE-2025-68356 was published and last modified on 24 December 2025 with the kernel-team-provided description and references to upstream stable commits.
- OSV and various distribution trackers imported the CVE and referenced the upstream commits and fixes shortly after the kernel-team submission.
Note: if a previously expected vendor page (for example an MSRC page) returns a not-found or unavailable error, use NVD/OSV and your distribution’s security tracker as the canonical technical sources for the kernel-side fix and then consult the vendor-specific advisory later for product-scoped attestations. Public distributor trackers often provide the package-level mapping that vendor cloud advisories later summarize for their images and marketplaces. Discussions of Microsoft’s advisory practice in related GFS2 advisories emphasize that cloud-targeted kernels (for example Azure‑targeted builds) are the vendor surfaces to inspect for kernel inclusion — so absence of a single vendor page does not mean the issue is irrelevant to that vendor’s cloud images; inventory all deployed vendor kernels.
Critical analysis: strengths, implementation risks, and operational advice
Why the fix is good engineering
- The change removes a permissive flag (__GFP_FS) from a clearly delineated allocation context (inode i_mappings) where reclaim recursion is known to be dangerous. That narrowly eliminates the recursion vector without broadly disabling reclaim or performance-affecting behaviors.
- It mirrors similar defensive changes done in other filesystems and in prior kernel work (the upstream discussion references analogous XFS fixes), indicating that the approach is proven and low-regression.
Potential limitations and things to watch
- Distribution backport quality: a correct backport must precisely replicate the intent (clear only the __GFP_FS bit without introducing over-restrictive allocation masks). Operators should validate that vendor backports were applied directly, not replaced by other unrelated changes that might alter allocation behavior in unexpected ways.
- Residual or parallel code paths: other filesystem or kernel subsystems with similarly permissive mapping masks might still allow reclaim recursion; this patch is targeted, not a global allocator policy change. Comprehensive hardening requires a combination of test coverage, fuzzing, and follow-up audits.
- Regression testing: because this touches low-level allocation flags, vendors should include targeted tests (for example the xfstest case referenced upstream) when validating backports to avoid performance regressions or allocation failures in corner cases.
Final recommendations for administrators
- Immediately inventory all hosts for GFS2 presence and prioritize patching for cluster nodes, virtualization hosts that mount shared block devices, and any multi-tenant or appliance systems that present or mount GFS2 volumes.
- Apply vendor-supplied kernel updates that map to the upstream stable commits referenced in NVD/OSV. If livepatch is offered, confirm the livepatch contains the specific gfs2 inode gfp_mask change before relying on it.
- Capture kernel logs and kdump/vmcore if any crash or hang is observed before rebooting; those artifacts are essential for forensic validation and vendor engagement.
- Treat this as an availability-first remediation: while not an immediate remote code-execution vector, the deadlock or kernel-level hang risk in shared environments demands rapid attention.
CVE-2025-68356 is a precise, low-regression kernel hardening for GFS2 that eliminates a dangerous allocation-flag configuration enabling recursive memory reclaim into filesystem transaction code. The remedy is focused, supported by upstream test-case correlation, and — when applied in vendor backports — should remove a practical deadlock and stack-exhaustion hazard for systems that run GFS2. Administrators should verify vendor package mappings, patch quickly on exposed hosts, and continue to monitor kernel logs and vendor advisories for any follow-up fixes or related hardenings.
Source: MSRC
Security Update Guide - Microsoft Security Response Center