
A recently published Linux kernel fix corrects a subtle but consequential KVM SVM fastpath bug that could cause host instability when the CPU does not supply the “next RIP” value; the patch forces SVM to avoid fastpath emulation for WRMSR and HLT VM-exits when the next RIP isn’t valid and instead perform full instruction decode/emulation to skip the instruction safely.
Background
KVM’s SVM backend (used for AMD virtualized guests) implements numerous fastpaths to handle common VM-exit reasons quickly and with minimal overhead. Fastpaths avoid heavy emulation work in the normal case by using CPU-provided state — for example, when the CPU supplies the instruction’s “next RIP” the hypervisor can skip emulation and advance the guest instruction pointer cheaply. Those optimizations are performance-critical across cloud and virtualization workloads, but they must obey kernel context rules: some fastpath code runs with interrupts disabled and cannot perform operations that may block or sleep. A recent upstream commit (applied to mainline and backported into stable trees) changes that behavior for two particular fastpaths — WRMSR and HLT — when the CPU does not provide a valid next RIP (for example, when KVM is operating with nrips=false). The change stops using the fastpath in those cases and forces a decode/emulation path that reads guest memory to obtain instruction bytes safely. The commit author explains the rationale: reading guest memory during emulation may fault (and thus can sleep), which cannot happen inside fastpath handlers that run with IRQs disabled. The bug was demonstrated by kernel BUG traces showing "sleeping function called from invalid context" originating from an instruction fetch through the emulator while in the fastpath context.What exactly was the problem?
Fastpaths, NRIP, and the “next RIP” signal
- Modern SVM implementations and KVM optimizations rely on the CPU providing a next RIP for many VM-exits. When that value is present, KVM can advance the guest IP without decoding the instruction or walking guest memory.
- Some configurations or CPUs may not provide a valid next RIP — the kernel and KVM have a configuration context referred to in upstream discussion as nrips=false or similar. In these cases the hypervisor must determine how far to advance the guest IP by decoding the guest instruction stream. That requires reading guest memory.
Unsafe sleep-from-fastpath scenario
- Certain SVM VM-exit fastpaths (notably WRMSR and HLT) historically attempted the fastpath even when the next RIP was not available. That made KVM call into the fastpath handler which assumed it would not need to read guest memory.
- When the CPU did not supply next RIP, KVM ended up invoking the emulator code path to decode and skip the instruction — and that decode requires fetching instruction bytes from guest memory.
- Fetching guest memory can cause page faults and requires the kernel to be in a context where sleeping/rescheduling is permitted. But the fastpath handlers run with IRQs disabled and in contexts where sleeping is forbidden. The resulting behavior produced kernel BUG messages and could destabilize or crash the host.
The upstream fix — what changed
The fix is straightforward in principle and surgical in execution:- For WRMSR and HLT VM-exits in the SVM VM-exit handler, the code now skips the fastpath when the next RIP isn’t valid and forces the emulator path to decode and emulate the instruction while holding the proper reader protections (SRCU or equivalent) and in a context that allows guest memory reads.
- The commit re-applies earlier logic (there was a prior related fix for WRMSR) with updated justification and locking choices so that the emulator path does the safe work rather than risking a sleep inside the fastpath. The upstream change carries commit id metadata that has been propagated into stable branches.
Technical analysis and implications
Why this is subtle but important
This vulnerability is not a classic memory-corruption or speculative-execution data leak. It’s a context/locking correctness bug where the code’s assumptions about what operations are allowed in a critical fastpath no longer hold under certain CPU/config combinations. That makes the issue subtle:- The bug requires specific conditions (fastpath taken, CPU not supplying next RIP, emulator path invoked while in fastpath context).
- It’s rooted in the intersection of CPU-provided state (next RIP), KVM’s performance micro-optimizations (fastpaths), and kernel-context rules (no sleeping with IRQs disabled).
- Small changes to CPU features, guest behavior, or KVM configuration (for example turning NRIP off) can expose the latent bug.
Impact class: host instability / denial-of-service (DoS)
Based on the commit text and kernel traces, this bug’s most likely impact is host instability — up to kernel BUG or oops — caused by a sleeping function being invoked from invalid context. That can manifest as:- Kernel BUG messages and stack traces in host logs.
- vCPU failures that lead to QEMU/kvm threads printing call traces.
- Potential host-level instability or resource corruption if the BUG escalates to oops/panic.
Exploitability considerations
- Preconditions: The host must be running the vulnerable KVM SVM code path for AMD guests and the CPU/host configuration must lead to next RIP being unavailable (the commit cites nrips=false as an example).
- Attack vector: a guest that executes specific instructions that hit the WRMSR or HLT VM-exit handling fastpath while the CPU doesn’t provide next RIP could trigger the unsafe code path. Because guests already cause VM-exits by executing privileged instructions, an attacker with a guest VM can plausibly craft trigger conditions.
- Outcome: stable reproduction would likely produce kernel BUG traces, possible vCPU/thread termination, or in worst case host-level oops/panic. That equates to a denial-of-service against the host. There is no published evidence that the bug allows arbitrary code execution in the host kernel or data leakage. However, denial-of-service is a serious concern for multi-tenant clouds.
Who is affected?
- Hosts running Linux kernels that include the vulnerable SVM KVM codepath. Public vulnerability databases indicate the condition shows up in kernel trees and that upstream patches have been applied; the JSON metadata used by downstream trackers lists kernels at and above certain recent versions as affected until the upstream patch is present. Distribution kernels that incorporate the fix or backport it to stable branches will be considered fixed.
- Virtual machines (guests) themselves are not directly compromised by this bug, but malicious or buggy guests can trigger host instability.
- Cloud providers and anyone running untrusted or third-party guests on shared infrastructure should prioritize patching due to the DoS implications.
Mitigation and remediation guidance
Immediate steps (short term)
- If vendor kernels with the upstream fix are available, apply them to hypervisor hosts immediately. This is the recommended and correct remediation.
- If you cannot immediately update the host kernel, consider limiting exposure: avoid running untrusted guests, or schedule critical hosts for maintenance windows to upgrade as soon as vendor packages are available.
- Monitor host logs for kernel BUG traces related to KVM, SVM, __might_fault, or messages that match "sleeping function called from invalid context" originating from KVM call traces. These traces are the primary indicator of attempted/triggered failure modes for this bug.
Workarounds (if patching is delayed)
- Review whether your environment can ensure the CPU/KVM configuration supplies a valid next RIP (i.e., avoid nrips=false). Note: changing low-level KVM/CPU feature flags can have other side effects; exercise caution and validate in test environments before changing live hosts.
- As a temporary measure, restrict or sandbox guests that may attempt to trigger the fault (for example, by limiting customers’ ability to execute privileged instructions or by isolating risky workloads).
- In private datacenter / lab scenarios you could run guests on hosts where KVM uses the emulation path by default (but that will almost certainly impose performance penalties). The correct long-term fix is the kernel patch.
How to verify the fix
- Confirm the host kernel includes the upstream commit id or a vendor backport. The upstream commit id referenced in stable lists was applied as a kernel patch; vendor advisories should list the fixed package.
- After upgrading, reproduce any previous test-case that triggered the BUG trace in a controlled lab; the host should no longer hit the "sleeping function called from invalid context" call trace when guests exercise WRMSR or HLT without next RIP.
Vendor and distribution status (what to watch for)
Upstream kernel maintainers published the patch and it has been proposed/accepted into stable/minor trees; distribution maintainers are expected to issue patched kernel packages or backports. Monitor these advisory channels:- Your Linux distribution’s security tracker and package updates (Debian/Ubuntu/Red Hat/SUSE release advisories).
- Cloud provider host kernel updates and managed platform advisories if you are on a managed hosting/compute service.
Detection and incident response
- Detection: look for kernel BUG/OOPS logs mentioning __might_fault, __might_resched, kvm_vcpu_read_guest_page, x86_decode_insn/x86_emulate_instruction, and the text "sleeping function called from invalid context". Those call traces usually include vcpu_run and kvm_amd frames that point to the SVM fastpath/emulator boundary.
- Response: If you observe these traces in production, isolate the affected host and correlate the offending vCPU/guest. If the guest is untrusted or from a third party, consider suspending or quarantining it until you can upgrade the host kernel.
- Post-incident: upgrade the host kernel to the fixed version and re-run the failing workload in a test environment to confirm the issue no longer appears.
Risk assessment for different environments
- Single-host virtualization (self-managed): medium-to-high risk if you run untrusted guests or multi-tenant workloads. Hosts exposed to potentially malicious guests should be patched immediately to avoid local DoS attacks.
- Cloud providers and hyper-scale tenants: high risk due to multi-tenancy; an attacker with a guest VM can intentionally try to force host instability. Providers should prioritize kernel updates and coordinate rollout with customer change windows.
- Desktop or single-user hosts running trusted VMs: lower risk if guests are fully trusted and controlled. Still, patching is advisable to avoid accidental host instability.
Why this matters beyond a single bug
This vulnerability highlights an enduring theme in hypervisor/hardware cooperation: micro-optimizations that rely on hardware-provided metadata (like next RIP) carry subtle correctness requirements. When the hardware or configuration does not provide the expected metadata, the hypervisor must revert to safe, but heavier, paths — and those paths must be executed in contexts that permit the operations they require (memory reads, potential faults, etc.. The cost of incorrect assumptions is host reliability, which has outsized implications in shared cloud infrastructure.Practical checklist for administrators
- Verify whether host kernels are marked as vulnerable in your vendor advisories; prioritize hosts running AMD SVM guests.
- Apply vendor-supplied kernel updates or backports as soon as they are available.
- If updates are delayed, restrict untrusted guests and monitor host logs for the characteristic BUG traces.
- Test patched hosts in staging with representative guest workloads to confirm the issue is resolved.
- Keep an eye on distribution and cloud-provider advisories for additional guidance and coordinated fix rollouts.
Closing analysis
CVE-2025-40038 is a classic example of a correctness bug at the intersection of hardware-provided optimizations and kernel context rules. It does not appear to enable direct code execution or data exfiltration, but it poses a credible denial-of-service and host-stability risk in production virtualization environments — particularly multi-tenant clouds where malicious guests can provoke host behavior. The upstream kernel fix is narrowly scoped, corrects the context-safety problem by avoiding unsafe fastpaths when next RIP is not available, and has been pushed into stable trees and distribution trackers. Administrators should treat this as a high-priority patch for hypervisor hosts and follow vendor advisories to apply fixes as soon as possible. Conclusion: patch promptly, monitor host logs for the BUG trace symptoms, and if you must defer updates, reduce exposure by isolating or suspending untrusted guests until host kernels are updated.Source: MSRC Security Update Guide - Microsoft Security Response Center