CVE-2025-37958: Linux THP Migration Bug and Azure Linux Exposure

  • Thread Author
The Linux kernel vulnerability tracked as CVE‑2025‑37958 — described in upstream as mm/huge_memory: fix dereferencing invalid pmd migration entry — is a concurrency bug in the Transparent Huge Page (THP) migration code that can trigger invalid memory accesses and kernel crashes during certain THP split/migration operations. Microsoft’s public advisory for this CVE confirms that Azure Linux images contain the upstream component in which the flaw was present and that Microsoft has published CSAF/VEX attestations for Azure Linux; the advisory also states that if additional Microsoft products are later found to include the vulnerable code, Microsoft will update the CVE record accordingly. This article explains what the bug is, how it behaves, which products and build choices determine exposure, whether Azure Linux is the only Microsoft artifact that could be affected, and what practical steps teams should take now to inventory, mitigate and remediate risk.

Background / Overview​

At its core, CVE‑2025‑37958 is a race-condition / invalid-reference bug inside the Linux kernel memory management path that handles huge pages — specifically, transparent huge pages (THP). The bug was discovered in fuzzing work and reproduced as a kernel oops during THP split and migration activity. When a THP is being migrated or split, an internal PMD (Page Middle Directory) migration entry can be read concurrently by a deferred split scan. Under a particular interleaving, code paths attempt to dereference a PMD migration entry that is no longer valid for the target folio, producing an invalid address access and a kernel fault.
This issue is located in upstream kernel source in the mm/huge_memory subsystem and affects kernels that contain the vulnerable commit range and that also include the relevant THP and migration code in the build. Multiple Linux distributions and vendors produced advisories and kernel patches for the flaw. Vendor severity ratings and CVSS scores vary by product and packaging, but most vendor assessments classify the issue as a medium-risk vulnerability that can cause denial-of-service (kernel crash) and, in some contexts, more severe consequences based on local privileges and environment.

Technical deep dive: what went wrong and how the fix works​

How THP and PMD migration interact​

  • Transparent Huge Pages (THP): THP is a kernel feature that maps larger physical pages (commonly 2 MiB on x86_64) transparently to userland to reduce TLB pressure and improve performance for some workloads.
  • PMD (Page Middle Directory): When the kernel maps a THP, a PMD entry represents the huge-page mapping at the mid-level of the page table hierarchy. During migration or split, the PMD entry can be transformed into migration entries and then inspected or replaced as part of splitting a huge page into regular pages (4 KiB) or moving a folio to another location.
  • Migration entries and deferred split scans: The kernel’s page migration and splitting logic uses temporary migration markers in PMDs. A deferred split scan iterates folios and pages to check for candidates to split or migrate later. A race can occur when a migration entry is inspected while another thread or CPU is concurrently performing a split or migration operation on the same folio.

The bug pattern​

  • A thread begins migrating a THP and inserts a migration entry into the PMD to indicate the folio is the target of migration.
  • Concurrently, the deferred split scan inspects the PMD and attempts to perform checks assuming the migration entry maps to a folio it can operate on.
  • The migration entry is locked and could change its semantic meaning (or be invalid in the concurrent context); code paths that assumed they could safely dereference the migration entry instead end up reading an invalid address or the wrong folio.
  • The result is an invalid address access (kernel oops) and potentially a system crash.

The fix​

The upstream fix tightens checks on the PMD migration entry before dereferencing and avoids unnecessary conversions (pmd_to_swp_entry / pfn_swap_entry_to_page) that were being used to verify folio equality. The safer approach is to check the migration entry and return early if the entry is not a valid, non-targetable folio — the commit ensures the migration entry cannot be used as a migration target because the entry is locked. In short: verify the PMD entry’s state before dereferencing it and do not rely on conversions that can be invalid under a concurrent interleaving.

Where the vulnerable code lives and why product mapping matters​

The vulnerable logic is in the upstream Linux kernel source tree (mm/huge_memory). That means the code can appear in any kernel binary built from upstream kernel sources that include the offending commits — including kernels distributed by cloud vendors, Linux distributions, appliance vendors and any product that ships its own kernel binary or kernel modules.
Two factors control whether a given build is actually exposed:
  • Kernel version / commit range: If the kernel binary includes the commit(s) that introduced the bug (or lacks the later fix/backport), it may be vulnerable. Distributors typically backport fixes to stable kernels as advisories are released; different vendors will have different timelines.
  • Kernel configuration: The presence of THP and associated migration code depends on kernel config options (for example, CONFIG_TRANSPARENT_HUGEPAGE). A kernel built without THP support, or with a configuration that disables the relevant subsystems, is not exposed even if the source contained the problematic code path.
Because these two axes vary across vendor artifacts, simply knowing that a vendor uses Linux does not prove exposure. Effective product-level triage requires mapping the upstream component to the exact artifacts a vendor ships.

Vendor responses and the Microsoft position​

After the upstream fix was merged, many distributions published advisories and kernel updates. Typical vendor actions included:
  • Issuing CVE entries and vendor advisories listing which kernel packages and releases were patched.
  • Assigning CVSS or priority ratings — scores vary across vendors because impact assessment (local vs. network vector, required privileges, and actual reachable consequences) differs by packaging context.
  • Releasing updated kernel packages or backports, or classifying certain builds as “no fix planned” where the package variant is out of support or not maintained.
Microsoft’s public advisory for CVE‑2025‑37958 (the MSRC update guide entry) explicitly states that Microsoft identified Azure Linux as a Microsoft product that includes the upstream component and is therefore potentially affected. The advisory also emphasizes that Microsoft began publishing CSAF/VEX (machine-readable vulnerability-exposure mappings) to improve transparency and that they will update the CVE product mapping if more Microsoft products are found to be affected.
This wording is important: Microsoft has confirmed Azure Linux contains the examined upstream code and has published the corresponding CSAF/VEX attestation. Microsoft’s statement does not assert that every other Microsoft artifact has been scanned and cleared — rather, it reflects a phased, product-by-product attestation process.

Is Azure Linux the only Microsoft product that could be affected?​

Short answer: No — but with an important operational nuance.
  • Public attestation: As of Microsoft’s published advisory for this CVE, Azure Linux is the Microsoft product Microsoft has publicly attested to include the upstream kernel component in question. Microsoft’s CSAF/VEX data and their advisory text explicitly map the vulnerable upstream component to Azure Linux artifact families.
  • Absence of attestation ≠ absence of risk: The absence of a CSAF/VEX attestation for other Microsoft products (for example, WSL2 kernels distributed with Windows, kernel artifacts embedded in specific Azure Marketplace images, Azure-managed appliances, or specialized kernels used in partner devices) is not proof those artifacts do not include the code. Microsoft has many independently produced kernel artifacts and images; each artifact is built separately and may have differing kernel versions and configuration flags.
  • Technical feasibility of exposure in other Microsoft artifacts:
  • Any Microsoft-distributed kernel binary built from an upstream commit range that predates the fix could contain the vulnerable code.
  • If that artifact’s kernel configuration enabled THP and the migration code, the artifact may be functionally exposed.
  • For a given Microsoft product, the deciding factors are the exact kernel version/commit and the kernel configuration used in that ship‑item.
In practical terms: Azure Linux is the only Microsoft product publicly identified and attested to by Microsoft as containing the vulnerable upstream component. However, the technical possibility remains that other Microsoft artifacts could include the same code depending on their build choices. Microsoft’s advisory language — promising to update the CVE mapping if additional products are impacted — is a phased transparency approach, not a blanket safety guarantee for other Microsoft deliveries.

How to verify exposure in a given Microsoft-provided artifact​

If you are responsible for Microsoft-delivered artifacts in your environment (Azure VM images, Marketplace images, WSL2 kernel, Azure-managed services), follow these steps to determine whether a particular instance is exposed:
  • Identify the running kernel:
  • Run: uname -a
  • Record kernel version string, build tag and build date shown in your environment.
  • Check kernel configuration for THP and relevant options:
  • If /boot/config-$(uname -r) exists: zgrep TRANSPARENT_HUGEPAGE /boot/config-$(uname -r)
  • Or check /proc/config.gz (if enabled) and search for CONFIG_TRANSPARENT_HUGEPAGE.
  • If the kernel was built without THP support, exposure is unlikely.
  • Look for dmesg / kernel oops evidence:
  • Search kernel logs for strings like split_huge_pmd_locked, deferred_split_scan, or other THP/migration traces that match the oops pattern reported in test reproduction logs.
  • Compare with vendor advisories / package changelogs:
  • For Azure Linux images, consult Microsoft’s published CSAF/VEX and the Azure Linux advisory metadata to confirm whether the specific image or kernel package you run was mapped to the CVE.
  • For other Microsoft artifacts (WSL2, marketplace images), examine the vendor-supplied kernel version and configuration metadata. If Microsoft has published CSAF metadata for that product, use it.
  • If you cannot determine exposure from the artifact metadata, treat the artifact as potentially affected and follow mitigations (below) until you can obtain a definitive answer.
Note: where you do not control the kernel (for example, a managed PaaS service), consult the service’s security/maintenance bulletins or contact Microsoft support to confirm remediation status.

Practical mitigations and remediation steps​

Short‑term mitigations can reduce risk while you coordinate an update; long‑term remediation requires applying vendor-provided kernel updates or using vendor-supported livepatches.
Immediate mitigations
  • Apply vendor kernel updates as soon as they are available: This is the only full remediation. Azure Linux customers should apply patched kernels provided by Microsoft for their image family or ensure they redeploy patched images.
  • Temporarily disable Transparent Huge Pages: If patching is delayed, disabling THP reduces the code paths that handle huge-page split/migration and therefore reduces exposure.
  • Example: echo never > /sys/kernel/mm/transparent_hugepage/enabled
  • Caveat: Disabling THP may affect application performance (positive or negative depending on workload) and is not a perfect fix for every environment.
  • Use kernel livepatch services where supported: Enterprise kernels and cloud vendors frequently provide livepatch mechanisms to apply critical fixes without full reboots. Check vendor advisories to see whether a livepatch is available for this CVE for your kernel variant.
  • Restrict local access and privilege elevation: Because the bug typically requires local activity that triggers THP migration or split behavior, tighten controls on untrusted processes and consider restricting access to workloads that can allocate and migrate huge pages.
  • Isolate untrusted workloads: Use stricter container or VM isolation, limit unprivileged users, and consider removing host-level capabilities that could trigger complex memory management operations for untrusted containers.
Longer-term remediation
  • Patch and reboot (if required): Install the vendor-patched kernel packages and reboot where necessary to run the fixed kernel.
  • Inventory and automation: Consume CSAF/VEX attestations where available to automate asset triage. For Microsoft customers, ingest Microsoft’s machine-readable product mapping for Azure Linux and other Microsoft artifacts as Microsoft publishes them.
  • Test workloads after disabling THP or patching: Because THP changes can affect performance, validate critical workloads under the patched kernel or with THP disabled.
  • Operationalize early-alerting: Add kernel oops strings and vendor advisory identifiers to your SIEM/monitoring so you detect evidence of exploitation attempts or kernel instability quickly.

How to operationalize Microsoft’s CSAF/VEX information​

Microsoft began publishing CSAF/VEX attestations to provide machine‑readable mappings between CVEs and Microsoft product artifacts. For customers:
  • Automate ingestion: Feed CSAF/VEX outputs into your asset management and vulnerability orchestration tools so that when a CVE mapping appears for a Microsoft artifact, you can automatically correlate the mapping to deployed assets.
  • Map by artifact not product: Treat each kernel build / image as a discrete artifact. Microsoft’s attestations will usually be artifact-specific (for example, Azure Linux kernel builds for particular VM SKU families).
  • Prioritize confirmed carriers: Assets that match published attestations should be treated as priority remediation items; assets not listed still require triage but are lower operational confidence.
  • Watch for updated attestations: Microsoft’s advisory language explicitly commits to updating CVE mappings if additional products are identified; subscribe to MSRC advisories (or vendor push channels) to get those updates.

Risk analysis: exploitability and likely impact​

  • Exploitability: The bug is primarily a local race that was demonstrated by fuzzing frameworks (e.g., syzkaller). Triggering the exact interleaving may require specific memory layout and workload conditions — for example, heavy memory churn, THP splits, and migration activity. That said, cloud or multi-tenant environments with noisy neighbors, or systems running intensive memory-management workloads, can increase the chance of hitting the race condition.
  • Privileges required: Many vendor advisories indicate the vulnerability is reachable with local privileges; some vendors give it a higher exploitation complexity, which lowers the immediate severity for remote attackers. However, a local untrusted code execution or privileged process could deliberately trigger the condition.
  • Primary impact: Denial-of-service (kernel crash / oops) is the most straightforward outcome. Kernel state corruption or more severe memory safety consequences depend on timing and environment, making full impact variable.
  • CVSS variance: Vendors differ in scoring; you will see CVSS assessments in the medium range in mainstream vendor advisories. Operationally, treat it as medium-high priority for internet-facing or shared-host environments where local untrusted actor access exists.

Practical Q&A: short, actionable answers​

  • Is Azure Linux the only Microsoft product Microsoft has identified as including the vulnerable code?
  • Microsoft’s public advisory lists Azure Linux as the product they have identified and attested via CSAF/VEX. That is the only Microsoft product Microsoft has publicly mapped to this upstream component for this CVE so far.
  • Does that prove no other Microsoft artifacts are affected?
  • No. Absence of attestation is not proof of absence. Other Microsoft artifacts may or may not contain the code depending on kernel version and configuration.
  • What should Azure customers do now?
  • Apply the patched Azure Linux kernel package or redeploy patched images, or follow Microsoft’s published remediation guidance for affected Azure Linux artifacts.
  • What should customers running other Microsoft artifacts do now?
  • Inventory the kernel artifacts you run (including WSL2, Marketplace images, managed VM images), check kernel versions/configs, consult Microsoft’s product attestation feed and vendor advisories, and treat unverified artifacts as potentially affected until triaged or patched.

Recommendations — prioritized checklist for security and ops teams​

  • Inventory all Linux kernel artifacts in your environment, including Microsoft-supplied images and kernels.
  • For all Azure Linux instances, apply the vendor patches that map to CVE‑2025‑37958 without delay.
  • For other Microsoft artifacts (WSL2, Marketplace images, managed services), verify whether published CSAF/VEX mappings exist for those artifacts; if not, treat them as potential exposures and perform configuration checks and mitigations.
  • If you cannot immediately patch, disable THP as a temporary mitigation and assess performance impact.
  • Use kernel livepatch where supported and available as an interim remediation.
  • Add kernel oops / THP-migration strings to monitoring so you detect vulnerability exploitation or instability.
  • Automate CVE attestation ingestion (CSAF/VEX) into your vulnerability orchestration platform to avoid manual triage errors.
  • Coordinate with Microsoft support for any environment-specific questions about WSL2, Azure Marketplace images, or hosted services.

Final analysis and outlook​

CVE‑2025‑37958 is a reminder that subtle concurrency races in low-level kernel subsystems can have outsized operational impact. The good news is that the vulnerability is well-scoped to the THP migration code and that upstream developers fixed it with targeted checks that avoid unsafe dereferences. Distributors and major cloud vendors moved to produce advisories and patches, and Microsoft’s decision to publish CSAF/VEX attestations for Azure Linux is a positive step toward automating product-to-CVE mappings for customers.
That said, Microsoft’s advisory language — confirming Azure Linux as an attested carrier and promising to update the CVE if additional products are identified — must be read carefully. Operationally, Azure Linux is the only Microsoft product Microsoft has publicly attested to include the affected upstream component for this CVE; however, the technical possibility remains that other Microsoft artifacts could include the same code depending on kernel version and build choices. Customers should not equate the absence of an attestation with absence of risk.
In practice, the safest course is simple and familiar: inventory, patch, and validate. Use the newly available machine‑readable attestations where provided, patch Azure Linux instances as priority, and triage other Microsoft artifacts by checking kernel versions and configurations. Where patching is delayed, apply well-understood mitigations such as disabling THP or using livepatch services and monitor for kernel oops traces that indicate the problem.
The fix is available in upstream kernels and has been backported by multiple distributors; the remaining operational task is mapping that fix to every kernel binary you run, not assuming that a single vendor attestation guarantees safety across an entire vendor’s diverse artifact set.

Source: MSRC Security Update Guide - Microsoft Security Response Center