RISC-V KVM VMID Fix: Correct HGATP Write for Bare MODE Compliance

  • Thread Author
A subtle specification mismatch in the RISC‑V KVM code has been fixed upstream: the kernel’s VMID detection routine wrote non‑zero fields into the hgatp CSR while probing for supported VMID bits, violating the RISC‑V Privileged Architecture requirement that when MODE = BARE the remaining fields of hgatp must be zero — an error tracked as CVE‑2025‑40065 and corrected with a narrow patch to the KVM VMID probe.

Background / Overview​

RISC‑V hypervisor translation uses the hgatp CSR (the G‑stage address translation register) to select guest translation mode and the VMID field to multiplex TLB entries between guests. The RISC‑V Privileged Architecture explicitly defines how hgatp should be written and how VMID bits are probed: to discover the number of implemented VMID bits you may write ones into the VMID field and read back which bits stuck, but when MODE=BARE software must write zero to the remaining fields of hgatp. This requirement exists because certain hgatp encodings have architectural meaning and because speculative or partially‑written values affect translation and TLB semantics. The Linux KVM RISC‑V implementation includes a gstage VMID probe executed at boot to determine how many VMID bits hardware supports. The original probe wrote a value that set a MODE field in hgatp using a prior-detected “valid mode” but left other hgatp fields non‑zero under some conditions — effectively violating the MODE = BARE write rule in corner cases. This off‑spec write was discovered, reported, and fixed upstream; the issue was assigned CVE‑2025‑40065 and landed into stable trees via surgical commits.

What the bug actually was​

The mechanics​

  • KVM needs to determine the number of hardware VMID bits (VMIDLEN) so the hypervisor can allocate guest VMIDs safely.
  • The canonical probe writes a temporary HGATP value with VMID bits set and reads the register back; the bits that remain indicate implemented VMID positions.
  • The RISC‑V spec warns that if MODE=Bare is selected, the rest of the hgatp fields must be zero; otherwise writes are architecturally invalid and may produce undefined behavior.
  • The KVM probe had previously used an HGATP value constructed by ORing HGATP_VMID into the old HGATP value (or using a detected “valid mode” in an unsafe way), resulting in fields beyond VMID being non‑zero in a code path that could select MODE=Bare — a mismatch that contravenes the spec.

The practical symptom​

There is no public report showing immediate remote code‑execution or privilege escalation from this bug. The issue is a compliance/ correctness error that could produce incorrect G‑stage translation behavior, TLB aliasing or subtle VM isolation problems on affected RISC‑V hardware, depending on how an implementation treats off‑spec hgatp writes. For this reason it was classified and fixed as a kernel issue rather than a wide‑scale exploit primitive; distributions and upstream prioritized a small, low‑risk patch that limits the change surface. The public advisories (NVD/OSV, distribution trackers) so far describe the fix and do not report active exploitation.

The upstream fix — surgical, minimal, and targeted​

The upstream patch replaces the unsafe write with an expression that constructs an hgatp write using only the validated mode bits and the VMID mask. Concretely, the code change updates the VMID detection writer from a version that did a:
  • csr_write(CSR_HGATP, old | HGATP_VMID);
to a safe form that explicitly writes the detected gstage mode shifted into the MODE field and the VMID probe mask:
  • csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
That adjustment prevents accidental non‑zero values in other hgatp fields (the write now composes only the permitted fields) and restores compliance with the Privileged Spec when MODE=Bare is chosen. The patch was reviewed and signed off by KVM/RISC‑V maintainers and merged into stable trees; it has associated upstream commit IDs and backports in the kernel stable series.

Affected code and versions​

  • The vulnerable code was introduced by an earlier commit that implemented the VMID allocator/probe. OSV and Debian tracking indicate the introduction point and the fixes applied; stable backports and stable kernel updates include the upstream commits that fix the probe. Affected git ranges and specific commits are referenced in OSV/NVD entries and in the kernel stable review logs. Administrators should treat kernels built from upstream trees containing the introducing commit (identified in the public OSV metadata) as potentially affected until they consume the fixed commits.
  • Distribution tracking (for example Debian’s security tracker) lists several 6.x kernel package versions that shipped with code requiring the fix and shows fixed versions rolled into subsequent point releases. Operators running RISC‑V Linux kernels from distributions should consult vendor advisories for the kernel package release that contains the stable commit IDs.

Impact analysis — what administrators should care about​

Technical consequences​

  • Specification non‑compliance: Writing hgatp with disallowed fields set while MODE=Bare contradicts the RISC‑V Privileged Architecture. The architectural contract exists to keep G‑stage translations deterministic and to constrain TLB / VMID behavior.
  • Potential translation / isolation issues: Depending on specific hardware behavior, off‑spec hgatp writes could lead to unexpected translation behavior or TLB aliasing. Such issues are hardware‑dependent and uncommon, but the kernel must avoid creating them in the first place.
  • No confirmed code execution or wide‑scale remote exploit: Public advisories and canonical trackers do not present evidence of exploitation leading to remote code execution or privilege escalation; the fix is correctness‑oriented and preventive. Treat the risk as integrity / translation correctness rather than an immediate RCE vector.

Operational risk profile​

  • Hosts running RISC‑V guests or RISC‑V native systems with KVM should be prioritized for patching in environments that rely on strong VM isolation (multi‑tenant hosts, security‑sensitive clusters).
  • Single‑user experimental machines or systems that do not enable virtualization may be lower priority, but patching remains recommended to maintain a correct kernel state and avoid subtle hardware corner cases.

Mitigation and remediation guidance​

Recommended immediate actions​

  • Inventory RISC‑V systems and KVM hosts: identify kernels and check whether the distribution’s kernel package includes the fixed commits. Useful checks:
  • uname -r
  • Check distribution advisories or package changelogs for the CVE or the upstream commit IDs referenced in OSV/NVD.
  • Consult your vendor’s security tracker or kernel changelog to confirm remediation status.
  • Apply vendor kernel updates that include the fixed commits and reboot hosts as required. The upstream patch is intentionally small; vendors are expected to backport or include it in stable releases.
  • If patching is delayed, reduce exposure of untrusted RISC‑V guests or postpone VM migrations that would place untrusted workloads on RISC‑V hypervisors until those hosts are patched. This is a conservative operational control rather than a technical workaround.

How to verify the remediation​

  • Confirm kernel package changelog or vendor advisory explicitly references CVE‑2025‑40065 or the stable commit ids (the OSV/NVD entries list the fixes and commit IDs).
  • In a staging environment, reproduce the previous VMID detection (boot with kvm_riscv_gstage_vmid_detect or run the same init path) and ensure that the probe constructs hgatp writes with only the permitted MODE and VMID fields. The fixed code composes the hgatp value explicitly from the detected gstage mode and HGATP_VMID mask.

Detection and monitoring​

  • There is no specific kernel oops signature tied to this issue the way other KVM fastpath bugs produce traceable failures; this is an off‑spec write that can be silent unless the hardware reacts unexpectedly.
  • Focus monitoring on:
  • Vendor advisories and kernel package changelogs for hosts in the inventory.
  • Any unusual G‑stage translation behavior, TLB invalidation anomalies, or guest isolation reports after kernel upgrades or firmware changes.
  • When in doubt, reproduce the kernel boot path in a controlled lab and inspect the sequence where kvm_riscv_gstage_vmid_detect runs to ensure it writes only the intended fields. The RISC‑V privileged spec documents the canonical VMID detection method and the need for HFENCE.GVMA when changing VMID reuse semantics; administrators should validate that the kernel’s VMID probe follows that guidance.

Why the fix is the right engineering choice — and its strengths​

  • Small, localised change: The upstream correction alters a single write expression in the VMID probe to compose the HGATP write from explicitly permitted fields. This is intentionally minimal and low‑risk: it removes the off‑spec write without reworking the VMID probe logic.
  • Spec‑aware behavior: By aligning the kernel write construction with the RISC‑V privileged spec (write zero to remaining fields when MODE=Bare), the kernel avoids depending on undefined hardware behavior and reduces the chance of hardware‑specific surprises.
  • Easier backport and verification: Small diffs are easier to backport into vendor stable kernels and to audit in patch reviews; maintainers have applied similar surgical fixes in past KVM/G‑stage corrections.

Remaining caveats and potential risks​

  • Hardware variability: Different RISC‑V implementations may treat off‑spec hgatp writes differently; while the kernel fix prevents the kernel from triggering off‑spec behavior, unpatched kernels on RISC‑V hardware may exhibit subtle, hardware‑specific translation anomalies.
  • No public exploit reports ≠ no risk: The absence of public exploits or active PoCs does not imply zero risk; off‑spec writes affecting translation and TLB behavior could, in theory, be combined with other implementation faults in multi‑stage exploit chains. Until hosts are patched, limit exposure for high‑value multi‑tenant deployments.
  • Patch verification required across vendor packages: Distribution kernels and vendor‑supplied images vary. Confirm each vendor’s kernel package changelog includes the upstream commit IDs noted in OSV/NVD rather than assuming universal shipping.

Practical checklist for IT and security teams​

  • Inventory all RISC‑V hosts and KVM‑enabled configurations.
  • Prioritize patching multi‑tenant hosts and hypervisors with untrusted workloads.
  • Apply distribution kernel updates that reference CVE‑2025‑40065 or the upstream commit IDs.
  • Reboot hosts into patched kernels during maintenance windows.
  • Validate the fix in staging by confirming the VMID probe writes are constrained to MODE and VMID fields only.
  • Monitor vendor advisories and upstream stable kernel announcements for any follow‑ups or additional fixes.
  • Confirm whether your kernel packages list the CVE or the stable commit(s).
  • Test patch in a representative environment that reproduces KVM boot paths.
  • Roll out the patch in staged waves and monitor logs for TLB/translation anomalies.
  • If immediate patching is impossible, restrict untrusted guest mobility to patched hosts.

Broader context: KVM fixes follow a pattern​

This CVE is a clear example of the recurring need for hypervisor code to match the hardware and architecture spec precisely. Similar KVM issues in recent years have shown that fastpaths and micro‑optimizations are fragile across CPU features and hardware nuances, and that small spec violations can produce outsized operational risk. Upstream maintainers typically prefer narrow, well‑reviewed fixes that restore conformance without imposing broad behavioral changes — the approach taken here. Operators should continue the practice of close coordination between kernel, vendor, and firmware updates for platform hardening.

Final assessment and recommendation​

CVE‑2025‑40065 is a correctness/compatibility vulnerability in the RISC‑V KVM VMID probe: writing hgatp fields in a way that violates the Privileged Architecture’s MODE = BARE requirement. The fix is small, targeted, and upstreamed into stable trees; distributions have been advised and packages updated in standard channels. There is no public evidence that this specific defect was the root of an active exploitation campaign, but the potential for hardware‑dependent translation anomalies makes patching the correct operational choice — especially for multi‑tenant and security‑sensitive RISC‑V virtualization hosts.
Action summary (concise):
  • Treat KVM RISC‑V hosts as high priority for patching.
  • Verify kernel package changelogs include the CVE or upstream commit IDs.
  • Apply updates, reboot, and validate VMID probe behavior in staging before widespread rollout.
Conclusion: The kernel community’s fix restores architectural conformance with the RISC‑V Privileged Spec using a minimal, well‑scoped patch. Operators should apply vendor updates promptly and verify patched kernels to preserve correct G‑stage translation semantics and VM isolation.

Source: MSRC Security Update Guide - Microsoft Security Response Center