CVE-2025-38158: Linux DMA Bug in Hisilicon VFIO Patch and Azure Attestation

  • Thread Author
The Linux kernel fix tracked as CVE-2025-38158 addresses a subtle but consequential DMA address assembly bug in the Hisilicon VFIO accelerator driver (hisi_acc_vfio_pci) that can leave guest kernel‑mode encryption services broken after live migration — and Microsoft’s short MSRC attestation that “Azure Linux includes this open‑source library and is therefore potentially affected” should be read as a product‑level inventory statement, not as proof that no other Microsoft product could carry the same vulnerable code.

Background / Overview​

CVE-2025-38158 was assigned to a kernel patch that corrects how DMA addresses for XQE/EQE/AEQE structures are assembled and handled after device migration in the Hisilicon VFIO PCI driver. The bug produced incorrect DMA addresses following migration and created a compatibility gap where guests migrated from older kernels could observe wrong data — effectively breaking in‑guest kernel‑mode encryption services that rely on the device. Upstream developers corrected the register‑read and address‑composition logic and added migration‑version checks so older magic numbers cause the driver to recompute DMA addresses on the destination side.
Multiple independent trackers and vendor bug databases published the same technical summary: the change lives in drivers/vfio/pci/hisilicon/* and is a targeted fix to address mis‑constructed DMA pointers during migration and recovery. The patch and its backports were reviewed and merged through the normal kernel MAINTAINERS and stable backport channels.

What exactly happened technically?​

The root cause in plain English​

When the VF device migrates between host kernels, the driver reads device registers and composes a DMA address from multiple register words. The code used the wrong sequence or a wrong combinatory step, which led to an incorrectly assembled address. On the destination side the driver used the (incorrect) address read from the device and attempted DMA operations against it, producing failures in the guest encryption path. Separately, guests coming from older kernels — where the earlier “magic number” format was present — could end up with stale address values unless the migration logic recognized and adjusted for the older format. The upstream patch corrects the address assembly and adds a version/magic check so the destination updates the DMA address when it detects an older-format migration payload.

Why this matters for virtualization​

  • DMA addresses are the fundamental mapping between device memory descriptors and host physical memory; incorrect DMA addresses can mean the device reads or writes the wrong physical memory regions.
  • For crypto offload accelerators used by guests to perform kernel‑mode encryption, corrupt DMA pointers can break cryptographic sessions silently, producing data‑loss, kernel errors, or service failure inside the VM.
  • Live migration is a core cloud operation. When device state is migrated, the receiver must reconstruct addresses in a way that makes sense on the new host. A mismatch here is operationally severe: it can turn a routine migration into an in-guest service outage for tenants.

Where the fix landed and how you can verify it​

The upstream fixes were merged into the kernel trees and then incorporated into stable backports; the changes touch drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c and its header. Distributors and downstream vendors (enterprise Linux and cloud kernel streams) have replicated those stable commits into their kernel packages and advisories. Administrators should map their running kernel to upstream commit IDs or to vendor package advisory numbers rather than rely on major kernel series alone.
Quick verification steps for operators:
  • Identify the kernel binary and its build provenance (uname -r plus vendor package metadata).
  • Check whether the hisi_acc_vfio_pci driver is present (lsmod | grep hisi_acc_vfio_pci or check /lib/modules/$(uname -r)/kernel/drivers/vfio/pci/hisilicon/).
  • Match the running kernel’s commit range or package release against the distributor advisory that lists the CPC (commit) or package that contains the fix.
  • If you run live migration or host guests that use Hisilicon acceleration, stage and test patched kernels in a quarantine environment before broad rollout.

Microsoft’s MSRC statement: what it actually says — and what it doesn’t​

Microsoft’s public CVE page for this vulnerability includes the line you quoted: “Azure Linux includes thisnd is therefore potentially affected by this vulnerability.” That is a product‑level mapping (an attestation) asserting that Microsoft inspected Azure Linux build outputs and found the implicated upstream component. Microsoft has also publicly committed to publishing machine‑readable CSAF/VEX attestations (a phased rollout started in October 2025) and to updating CVE product mappings if other Microsoft products are later identified as carriers.
Crucially, this attestation should be read as:
  • An authoritative statement that Azure Linux (the product named) is a known carrier and therefore a remediation priority.
  • Not an exclusivity guarantee that no other Microsoft product includes the same vulnerable kernel code. The statement is an inventory outcome for the product Microsoft explicitly checked, not a universal scan across every Microsoft distribution, image, or kernel artifact.
Multiple independent community analyses and vendor discussions demonstrate this pattern repeatedly: Microsoft names Azure Linux when it verifies the component in that product, and will expand attestations as the company’s VEX rollout inventories more artifacts. Until an artifact (e.g., a WSL2 kernel, a Marketplace VM image, an AKS node image, or a curated linux-azure kernel) is explicitly attested or proven absent, it should be treated as unverified rather than proven safe.

Is Azure Linux the only Microsoft product that includes the library and is therefore potentially affected?​

Short answer: No — not necessarily. Azure Linux is the only Microsoft product Microsoft has publicly attested (so far) to ship the vulnerable upstream component; that attestation is authoritative and actionable for Azure Linux customers. However, the presence of the same upstream code in other Microsoft artifacts is an artifact‑level property that depends on kernel version, build configuration, module inclusion, and backporting decisions. Until Microsoft attests additional products or you verify an artifact yourself, other Microsoft kernels and images remain possible carriers and must be validated.
Why that nuance matters:
  • Microsoft — like other large vendors — ships many kernel artifacts: WSL2 kernel builds, curated Azure VM images, Marketplace appliances, AKS node images, linux-azure kernels, custom distribution snapshots, and more.
  • Each artifact is built from a particular upstream commit set and with particular CONFIG_* flags; a vulnerable driver present in one artifact may be absent from another simply because it was not compiled, was built as a module and not loaded, or the vendor backported the fix into only some kernel streams.
  • A VEX/CSAF attestation is a high‑value automation primitive, but the phased rollout means coverage is incremental. Treat the attestation as a strong signal for the named product, and treat the absence of attestations elsewhere as “not yet attested,” not “not affected.”

Practical guidance for defenders and cloud operators​

If you manage infrastructure that could be affected by CVE‑2025‑38158, here are prioritized, practical actow.

1) Immediate actions (hours → days)​

  • Patch Azure Linux images immediately: follow Microsoft’s advisory and update Azure Linux hosts to the kernel package versions that include the upstream hisi_acc_vfio_pci fix. Microsoft’s attestation makes Azure Linux a confirmed
  • Inventory Microsoft artifacts in your estate: identify all Microsoft‑provided Linux kernels/images you run (WSL2 kernels, Azure Marketplace images, AKS images, linux-azure kernels, image templates). Do not rely on the absence of a Microsoft attestation as proof of safety.
  • Verify presence of the driver at artifact level:
  • On each image: run uname -r; inspect /lib/modules/$(uname -r)/ for drivers/vfio/pci/hisilicon; run lsmod to see if module is loaded.
  • If the driver is present but not loaded, determine whether your workloads will ever load it (for example, if you pass a Hisilicon device to a guest).
  • For VMs that use Hisilicon accelerators: consider a maintenance window to test patched kernels. Live migration tests that exercise device migration paths should be part of validation.

2) Short to medium term (days icrosoft’s CSAF/VEX outputs (when available) into your vulnerability‑management pipeline. These machine‑readable attestations will automate part of artifact mapping as Microsoft expands coverage. However, keep artifact‑level checks as a parallel control.​

  • For images you cannot patch immediately, implement mitigations where feasible:
  • Avoid migrating VMs that are actively using the Hisilicon accelerator device until you’ve validated a patched host image.
  • In multi‑tenant clouds, limit allocation of Hisilicon devices to trusted workloads until fixes are applied.
  • If you operate live‑migration automation, add migration smoke tests that validate device state post‑migration for critical acceleration paths.

3) Long term (policy and lifecycle)​

  • Maintain a CMDB/SBOM that tracks kernel artifacts, kernel source commit provenance, and build configurations for cloud images and appliance imal.
  • Require per‑artifact attestations (vendor or in‑house) for any kernel image you run in production. A single statement that Azure Linux is affected is a useful start; per‑artifact verification is the operational closure you need.

Critical analysis: strengths and remaining risks​

Strengths in the current landscape​

  • Upstream developers fixed the bug in a focused, low‑risk manner; the patch itself is surgical and targets register assembly and migration checks, minimizing regression risk. The Linux‑kernel maintainers and stable‑tree process moved the fix through standard channels.
  • Microsoft’s MSRC attestation model and the shift to publishing CSAF/VEX attestations is a tangible improvement in transparency for downstream consumers: machine‑readable attestations reduce ambiguity and enable automation for Azure Linux customers.

Risks and gaps to watch​

  • The major practical risk is assumptive complacency: organizations that treat MSRC’s Azure Linux line as an exclusivity statement risk missing other Microsoft‑distributed artifacts that bundle the same vulnerable driver. Numerous community analyses emphasize that an attestation for product A does not imply an exhaustive scan of products B–Z.
  • Inventory completeness takes time. Microsoft’s VEX rollout is phased; until attestations cover the entire product surface, defenders must perform artifact‑level verification and cannot solely rely on vendor attestations.
  • In complex environments the path to full renally heavy: kernel updates require reboots, and backports vary by distribution — mapping “fixed” to a package name is vendor specific. This is especially acute in large fleets or in regulated change‑control environments.

When a vendor names one product, what internal checks should you run?​

To move from uncertainty to confidence, perform these concrete artifact‑level checks:
  • Confirm which kernel binary is running (uname -a) and obtain the kernel package metadata from your package manager (dpkg -l, rpm -qa).
  • Search the kernel modules tree for hisi_acc_vfio_pci: find /lib/modules/$(uname -r) -type f -name 'hisi_acc_vfio' and inspect the module or source package version.
  • If possible, compare your kernel’s commit range with the upstream commit IDs referenced in upstream fixes (match commit hashes or release tags).
  • For images supplied by cloud vendors (Azure Marketplace, curated images), request or retrieve the SBOM / VEX attestation for the image and validate whether the implicated kernel component is present in the image build. If no SBOM/VEX is available, treat the artifact as unverified and escalate to the image owner for remediation.
  • Run a dry migration in a non‑production environment that exercises the VF migration path and validate guest encryption service behavior pre/post migration.
These steps produce an evidence trail you can act on; they convert vendor‑level attestations into artifact‑level risk decisions.

Final assessment and recommendations​

CVE‑2025‑38158 is a targeted kernel fix that resolves incorrect DMA address assembly and migration handling in the Hisilicon VFIO PCI driver; the technical risk is real for hosts that both include the driver and actually use the Hisilicon accelerator device for in‑guest kernel‑mode encryption. Upstream and downstream vendors have applied the corrective commits.
Microsoft’s MSRC sentence naming Azure Linux as including the implicated open‑source code is accurate and actionable for Azure Linux customers: if you run Azure Linux images, prioritize the vendor’s kernel updates. But that MSRC line is an attestation for the product Microsoft inspected; it is explicitly not a technical guarantee that no other Microsoft product includes the same code. Treat Microsoft’s attestation as the start of your triage — followl verification for any other Microsoft‑provided kernels and images you run, and ingest Microsoft’s CSAF/VEX outputs when they are published inkflows.
Practical checklist (summary):
  • Patch Azure Linux hosts now if they are in your estate.
  • Inventory Microsoft kernel artifacts and verify per‑artifact exposure (WSL2 kernels, Marketplace VM images, AKS node images, linux‑azure).
  • Where device migration is used in production, stage migration smoke tests after patching.
  • Integrate CSAF/VEX attestations and keep watching MSRC updates for expansion of product mappings.
  • If you cannot patch immediately, avoid migrations for VMs that use the affected Hisilicon device and reduce or isolate usage of the accelerator until the fix is in place.
The responsible operational posture is twofold: act on Microsoft’s attestation for Azure Linux without delay, and simultaneously treat other Microsoft artifacts as unverified until you have performed the artifact‑level checks described above. That dual approach closes the practical gaps between vendor attestations and the messy reality of enterprise artifact sprawl — and it’s how you avoid turning a single confirmed remediation target into an overlooked exposure across your estate.

Source: MSRC Security Update Guide - Microsoft Security Response Center