CVE-2025-68198: Linux crashkernel shrinking bug and availability risk

  • Thread Author
Neon blue server scene with a prominent “Mismatch” error and crashkernel options.
A serious kernel-level bug has been recorded as CVE-2025-68198: a defect in Linux’s crashkernel handling that can leave invalid crashkernel resource objects and, under repeated shrink operations, produce a kernel NULL-pointer dereference and panic — in short, a reproducible availability hazard when administrators shrink a crashkernel reservation configured with a high reservation.

Background​

When a Linux system is configured with kexec-style crash capture, the kernel reserves a block of memory for the crash kernel (commonly set with the kernel boot parameter crashkernel= or adjusted at runtime via /sys/kernel/kexec_crash_size). The crashkernel mechanism maintains separate resource objects for the high and low crashkernel reservations. A logic mistake in the shrink path meant the kernel sometimes updated the wrong resource object when reducing the crashkernel size — leaving stale or incorrect resource descriptors in /proc/iomem and, if shrinking was repeated below the low-reservation boundary, triggering a NULL-pointer dereference that crashes the system. This bug is documented in multiple public vulnerability trackers and was published to the NVD and OSV databases on December 16, 2025. The public CVE summary concisely states the two primary impacts: (1) invalid crashkernel resource objects after a shrink and (2) a kernel crash when shrink operations are performed twice crossing the low-reservation boundary.

What happened technically​

The root cause, in plain terms​

The vulnerable code lives in kernel/crash_core.c (the crashkernel management code). When an administrator changes the crashkernel size at runtime (writing to /sys/kernel/kexec_crash_size), the kernel executes a shrink routine that must update the correct in-kernel resource object representing the reserved crash memory region.
Due to an incorrect update path in __crash_shrink_memory, the shrink code could update the wrong resource entry (the high-reservation object rather than the low-reservation object, or vice versa). That mismatch leaves resource entries inconsistent with the actual reserved range. A later shrink operation that reduces the reserved size further can then touch freed or nullified structures and dereference NULL, producing a kernel oops and panic on x86. The canonical stack trace posted in the public advisories shows the NULL-pointer fault occurring in the crash shrink functions.

Observable symptoms and logs​

When the bug is triggered the kernel may produce a clear OOPS backtrace that references crash_shrink_memory or kexec_crash_size_store. On affected systems you can see mismatches between expected and actual crash reservations in /proc/iomem after a shrink operation: the reserved region reported there will still reflect the old low-reservation value instead of the new, smaller size. Repeating the shrink across the boundary can then escalate to a kernel NULL-pointer dereference and a full panic. The NVD entry includes an example trace and explains the conditions that reproduce this behavior.

Who’s affected and the real-world risk​

  • Affected component: the Linux kernel’s crashkernel management code. This is a local, kernel-internal operation and not a network-facing service by itself.
  • Typical exposure: systems that use crashkernel with a high reservation (for example crashkernel=200M,high) and which may adjust the crash reservation at runtime. Systems that programmatically change /sys/kernel/kexec_crash_size or that use automation which resizes crashkernel fields are most at risk.
  • Primary impact: availability. The bug is a deterministic crash primitive when the shrink sequence is performed in the problematic order; that makes it a denial-of-service (DoS) vector. There is no public evidence that this vulnerability can be chained to privilege escalation or remote code execution — it is an availability-first defect.
High-priority targets for remediation include:
  • Multi-tenant hypervisor hosts and cloud infrastructure where an untrusted tenant, operator automation, or misbehaving management tooling could cause a resize sequence.
  • CI runners, build farms and shared developer infrastructure that automatically manipulate kernel parameters or perform runtime tuning.
  • Any system where high uptime or automated crash reservation changes are common.
Lower immediate priority: single-user desktops or embedded systems that do not change crashkernel at runtime. Nonetheless, administrators of such systems should still plan to update as part of routine kernel maintenance.

What the upstream fix does​

The upstream patch corrects which resource object is updated when shrink operations occur: __crash_shrink_memory now ensures the appropriate crashkernel resource (low vs. high) is updated rather than incorrectly touching the other structure. The patch is corrective and narrowly scoped — it fixes the resource bookkeeping so /proc/iomem reflects the new reservation and removes the code path that could later dereference freed/null resources during a subsequent shrink. Stable-tree commits implementing the fix were published alongside the CVE entries. This is a typical kernel pattern: a small, defensive change that restores invariants in resource management. Such fixes are usually easy to backport into distribution-stable kernels and present a low regression risk when compared with broad architectural rewrites. Public tracking tools and downstream distributions generally treated the fix as a surgical patch suitable for stable branches.

How operators should respond (remediation and verification)​

Immediate action (recommended)​

  1. Identify systems with crashkernel configured and running kernels older than the fixed commits: run uname -r and check your distribution’s security tracker or changelog for CVE-2025-68198 mappings.
  2. If your distribution has released an updated kernel package containing the upstream fix, schedule and apply the update and reboot into the patched kernel — kernel-level fixes require a reboot to take effect.

Short-term mitigations (if you cannot patch immediately)​

  • Avoid resizing crashkernel in production until the kernel has been patched. In particular, do not repeatedly shrink the crashkernel value across the low/high boundary. Treat any scripts or automation that adjust /sys/kernel/kexec_crash_size as risky until kernels are patched.
  • If resizing is unavoidable, perform careful pre-validation: check /proc/iomem both before and after a shrink and avoid sequences that cross the low-reservation threshold. If you must shrink, avoid doing it twice in rapid succession — the crash occurs specifically when the shrinking action is performed a second time across the boundary.
  • For automated environments, add gating that prevents automated tuning of crashkernel size on hosts that have not been validated. These are temporary controls only; they do not substitute for a patched kernel.

Verification after patching​

  • Confirm the patched kernel’s changelog or vendor advisory lists CVE-2025-68198 or contains the upstream stable commit IDs related to the crash_core.c fix.
  • Validate behavior in a lab: set crashkernel=200M,high (or your equivalent configuration), then reduce /sys/kernel/kexec_crash_size to a lower value and confirm /proc/iomem correctly reflects the new reservation and no kernel oops occurs on repeated shrink attempts. The patched kernel should not produce the previous NULL-pointer crash trace.

Detection and hunting guidance​

Look for the following telemetry and logs:
  • Kernel OOPS traces referencing crash_shrink_memory, __crash_shrink_memory, or kexec_crash_size_store. These traces are the canonical signs that the vulnerable path was executed and crashed.
  • Inconsistencies in /proc/iomem after performing a shrink: if you shrink the reservation to X but /proc/iomem still shows a larger low reservation value, the code may have updated the wrong resource object. That mismatch is the first signal that the buggy code path ran.
  • Unexpected reboots or watchdog timeouts correlated with management-plane changes that adjust crashkernel or with automation that writes to /sys/kernel/kexec_crash_size.
For forensic collection:
  • Preserve dmesg and journalctl -k logs prior to any reboot; OOPS traces are ephemeral and must be captured immediately.
  • Collect kdump crash dumps (if available) and correlate the stack trace to the crash_shrink_memory functions. These captures make vendor or upstream triage far more effective.
Operational playbook and prioritization are consistent with general kernel-availability CVE guidance: prioritize multi-tenant systems, hypervisors, CI runners and other hosts that expose attack surface to untrusted code. The public advisories recommend mapping vendor package versions to the upstream commits to determine whether your packages include the fix.

Why this patch is important — strengths and residual concerns​

Strengths of the upstream fix​

  • The repair is small and narrowly scoped to the crashkernel shrink path, which reduces regression risk and makes vendor backports straightforward. This enables faster patch distribution across stable kernel branches.
  • The change restores an explicit resource-management invariant: update the resource object that actually represents the region being resized. Restoring such invariants fixes both the user-visible mismatch (/proc/iomem) and the deeper crash primitive.

Potential caveats and residual risks​

  • This is an availability bug that requires local actions to trigger; the most realistic threat is a local or tenant-caused DoS, not an unauthenticated remote exploit. However, deterministic crash primitives are attractive in multi-stage chains or disruptive campaigns in multi-tenant infrastructures. Treat DoS vulnerabilities as high priority for hypervisors and shared infrastructure.
  • Distribution and vendor lag: upstream commits may be available quickly, but vendor kernel packages (especially for appliance or embedded kernels) can lag. Ensure you verify vendor advisories rather than assume immediate coverage. For long-lived appliances or vendor images, tracking and vendor coordination remain necessary.

Practical checklist for administrators​

  • Inventory: find hosts with crashkernel configured: grep -i crash /proc/iomem; check boot parameters for crashkernel= and inspect /sys/kernel/kexec_crash_size.
  • Map: consult your distribution security tracker and package changelogs for CVE-2025-68198 to identify which packaged kernels include the fix.
  • Patch: install vendor-supplied kernel updates that reference CVE-2025-68198 or the upstream commit IDs, and reboot into the patched kernel. Kernel updates are the definitive remediation.
  • Validate: in a staging environment, reproduce a shrink sequence (including a sequence that previously crashed) to confirm the system no longer panics and /proc/iomem reflects the new reservation.
  • Monitor: add SIEM rules for kernel OOPS traces that mention crash_shrink_memory and alert on unexpected /proc/iomem reservations after administrators manipulate crashkernel size.
A short, conservative rollout plan: (1) pilot patch on a small representative set of hosts that exercise crashkernel adjustments, (2) expand to staging, (3) full production rollout — with monitoring windows and rollback plans. This reduces the risk of widespread unintended disruption from any kernel update.

Broader context and lessons​

This CVE is another instance of a common kernel-class problem: resource bookkeeping errors that can turn benign administrative operations into availability hazards. The kernel community’s preferred pattern — small, surgical fixes that restore invariants and add defensive checks — is effective here because it minimizes regression risk and permits rapid backporting. Operators should treat such fixes as high priority for infrastructure that hosts untrusted code or requires high availability. Operational hygiene that reduces exposure to similar classes of bugs includes:
  • Limiting automated runtime adjustments to low-level kernel parameters on production hypervisors unless necessary.
  • Keeping a minimal and well-audited set of kernel tuning scripts under change control.
  • Maintaining an inventory of vendor/kernel package versions and mapping them to upstream commit IDs for rapid triage when new CVEs are published.

Final assessment​

CVE-2025-68198 is an availability-focused Linux kernel vulnerability caused by incorrect resource-object updates during crashkernel shrinking. The defect is deterministic and reproducible under the right configuration (a high crash reservation that is later shrunk), and the upstream fix is a surgical correction to the shrink path that removes the crash primitive and corrects /proc/iomem bookkeeping. Operators should treat multi-tenant and hypervisor hosts as the highest priority for remediation, install vendor-supplied patched kernels, and avoid runtime crashkernel resizing until the fix is applied. There is no public evidence of privilege escalation or remote-code-execution chains built from this bug; those claims remain unverified in public trackers as of the disclosures.
By following the verification steps and prioritized remediation checklist above, administrators can eliminate the immediate availability risk posed by CVE-2025-68198 and reduce the chance that routine administrative operations will convert into system-wide outages.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top