CVE-2025-3001: PyTorch 2.6.0 LSTM Cell Memory Corruption

  • Thread Author
A critical memory‑corruption flaw in PyTorch’s low‑level LSTM cell implementation — tracked as CVE‑2025‑3001 — has been publicly disclosed and reproduced, creating an urgent, if narrowly scoped, operational risk for systems that run untrusted or local model code built against the affected release. The vulnerability affects the function torch.lstm_cell in PyTorch 2.6.0 and can cause a segmentation fault or memory corruption when invoked with crafted inputs; the defect is exploitable only with local access but has a low attack complexity and a public proof‑of‑concept, so operators must act to identify and mitigate exposure immediately.

Neon flame above glowing cubes with a code panel labeled CVE-2025-3001 and a cracked red S on the floor.Background / Overview​

LSTM (Long Short‑Term Memory) cells are fundamental building blocks for many recurrent neural network models used in time series, speech, and language tasks. PyTorch exposes both high‑level LSTM modules and a lower‑level utility, torch.lstm_cell, that performs the internal gate calculations for single‑step LSTM operations. This low‑level function is typically used inside custom recurrent layers, for testing, or by toolchains that manipulate model internals.
CVE‑2025‑3001 targets that low‑level routine in PyTorch 2.6.0 and is described as an out‑of‑bounds / memory‑corruption issue (CWE‑119). National and vendor vulnerability repositories classify the flaw with a medium severity rating (CVSS mid‑range scores reported across trackers) and emphasize that exploitation requires local access. Several public trackers and security vendors have ingested the report and reproduced the crash, and the original PyTorch issue includes a minimal reproducible example that triggers a segfault.

What the vulnerability is (technical summary)​

The immediate symptom​

  • The observable failure mode is a hard crash (segmentation fault) and memory corruption when torch.lstm_cell is called with specific input shapes and values.
  • The upstream GitHub issue includes a concise reproducer that exercises empty tensors and unusually large or extreme weight constants, resulting in a segfault on CPU builds running PyTorch 2.6.0.

Root cause class​

  • The defect is an out‑of‑bounds memory access in native code paths used by torch.lstm_cell, consistent with a classic buffer‑read/write beyond an allocated memory region (CWE‑119).
  • Public vulnerability descriptions reference an improper restriction of operations within the bounds of a memory buffer and categorize the problem as memory corruption rather than a logic‑only miscompute.

Attack model and constraints​

  • Attack vector: Local — an attacker needs the ability to run Python code on the target system or trick a process into calling the vulnerable API with attacker‑controlled inputs.
  • Privileges required: Low — the PoC requires only the ability to execute unprivileged code on the host.
  • Complexity: Low — the reported reproducer is short and deterministic and was posted publicly, which lowers the bar for weaponization.
  • Impact: Memory corruption leading to process termination; while in theory memory corruption can be escalated to remote code execution, there is no public, reliable RCE chain published at the time of disclosure. Treat claims of RCE as speculative until an exploit chain is demonstrated.

Reproducer and developer evidence​

The canonical reproducer was filed as GitHub issue #149626 in the PyTorch repository and demonstrates the crash in a few lines of Python that construct zero‑length/edge‑case tensors and call torch.lstm_cell. The issue contains the exact input shapes and sample constants used to trigger the segfault, plus metadata showing the failure on torch 2.6.0. This public issue is the authoritative technical evidence for the bug’s existence and reproducibility. Key facts confirmed in the public issue and consolidated vulnerability records:
  • The failing invocation uses corner‑case tensor shapes (including empty dimensions) combined with tensor values that push native arithmetic into corner cases.
  • The crash reproduces reliably on the reported PyTorch build (2.6.0) in standard environments.

Cross‑verification with multiple sources​

To ensure robustness of the reporting and to help operators prioritize action, the vulnerability and reproducer are reflected across independent, high‑quality sources:
  • The NIST National Vulnerability Database (NVD) entry for CVE‑2025‑3001 summarizes the report and assigns the memory‑corruption classification and local attack vector. NVD is a canonical registry that records the CVE metadata.
  • The upstream PyTorch issue (GitHub #149626) contains the minimal reproducer and developer discussion demonstrating the segmentation fault and is used by maintainers to triage and patch the code.
  • Commercial vulnerability trackers (Snyk, Aqua Security, others) have ingested the CVE record, published advisory summaries, and note that no fixed wheel was available in some vendor feeds at disclosure time. Those pages are useful for distribution‑level status mapping and for automated scanning.
  • Distribution security notices (for example Ubuntu’s CVE entry) list package evaluation status and CVSS scoring used by downstream maintainers to decide backports and patches.
Cross‑checking the GitHub repro with NVD and multiple vendor trackers confirms that the problem is real, reproducible, and tracked across the ecosystem. Operators should treat these corroborated facts as actionable signals.

Scope — which builds are affected?​

  • The upstream reports consistently name PyTorch 2.6.0 as the affected version. Public vulnerability records and trackers map the CVE to that release.
  • Whether a particular runtime is exposed depends entirely on the actual PyTorch binary present in that runtime. Container images, prebuilt vendor binaries, or distribution packages may be rebuilt or repackaged with different versions; the only reliable indicator of exposure is the installed torch version on the target host.
  • Vendor attestation practices vary. A missing or not‑found page on a particular vendor portal (for example, an MSRC URL returning “not found”) does not mean the vulnerability is unacknowledged — it often reflects phased publication and inventorying of affected images. Treat vendor pages as one input and verify artifacts host‑by‑host.

Practical detection and verification​

Quick checks to determine if a host or image is carrying an affected PyTorch build:
  • In a running Python interpreter:
  • python -c "import torch; print(torch.version)"
  • python -c "import torch; print(torch.version)"
  • From a package manager:
  • pip show torch
  • conda list | grep torch
  • In a Docker container:
  • docker run --rm <image> python -c "import torch; print(torch.version)"
  • In CI or image scanning pipelines:
  • Inspect Dockerfile and base image manifests for pinned wheel installs (e.g., wheels named torch‑2.6.0‑...whl)
  • Scan images with vulnerability scanners that include the CVE database (Snyk/Aqua/Trivy) and fail builds if a vulnerable wheel is present.
Adopt a fail‑fast CI check that rejects images or build artifacts that report torch==2.6.0 until they are rebuilt with an upstream or vendor patch.

Immediate mitigations (short‑term, tactical)​

When a direct patch/binary upgrade is not immediately available or you must triage quickly, adopt the following prioritized mitigations:
  • Avoid calling torch.lstm_cell directly in untrusted contexts. If your service accepts user code or models, do not compile or execute flows that invoke this low‑level API.
  • Run model code in eager mode rather than compiled or in environments that use native extensions, when dealing with untrusted inputs. Eager execution tends to exercise the interpreter code path and may avoid the native routine that triggers the bug.
  • Sandbox and isolate model execution:
  • Use ephemeral VMs or tightly constrained containers for untrusted workloads with enforced CPU, memory, and wall‑clock time limits.
  • Rate limit model submissions and add quotas to prevent low‑cost, repeated local exploitation attempts.
  • Add regression tests that exercise torch.lstm_cell with corner‑case shapes and values under both eager and compiled modes; fail builds that diverge or crash. These tests can detect regressions or confirm a patched runtime.

Medium‑term remediation (when patched wheels are available)​

  • Upgrade to a patched PyTorch release as soon as the upstream project publishes it, or deploy a vendor‑provided backport if your environment relies on distribution packages.
  • Rebuild any containers, base images, or curated runtime artefacts to include the patched wheel; replace image tags in CI/CD and orchestrated deployments.
  • For managed runtimes (e.g., cloud curated images, managed notebooks), subscribe to vendor security advisories and adopt vendor‑issued images that explicitly document inclusion of the patched PyTorch.
  • After patching, run regression tests that include the original repro to confirm the crash no longer occurs under representative loads.
Note: Several vendor trackers reported that, at disclosure time, not all downstream packages had an immediate backport; operators must therefore monitor vendor notifications and rebuild images when official patched wheels are published.

Hardening and long‑term operational controls​

  • Enforce image signing and pinned registries to prevent older, vulnerable images from being redeployed accidentally.
  • Treat low‑level framework APIs used by model submission pipelines as a security boundary: restrict who can submit model code that invokes native or internal functions.
  • Add continuous regression checks in CI that compare eager versus compiled outputs for critical operators; divergence or crashes should block promotion.
  • Instrument production model serving with health checks and telemetry that detect unusual worker crashes, repeated segmentation faults, or unexpected process restarts.
These controls reduce the attack surface and make silent correctness or corruption issues easier to detect and contain.

Risk assessment — how worried should you be?​

  • High‑risk groups:
  • Multi‑tenant model hosting platforms, public notebook services, shared CI/CD systems, and any service that compiles or executes arbitrary user models. These environments have the largest blast radius because an attacker can submit code that invokes the vulnerable API.
  • Environments that process untrusted plugin code or third‑party models, including some model marketplaces and hosted notebooks.
  • Lower‑risk groups:
  • Single‑tenant, air‑gapped, or developer workstations running only trusted code are less likely to be remotely exploited. However, the bug can still cause local crashes and data corruption that affect reliability and trust in experimental runs.
  • Exploitability:
  • The public reproducer demonstrates local crashability with low complexity, so automated tooling could weaponize a denial‑of‑service. Claims of privilege escalation or remote code execution are unverified and should be treated with caution unless a full exploit chain is published and validated.

Why vendor attestations (and missing pages) matter — and what they don’t tell you​

Vendors sometimes publish CSAF/VEX attestations indicating which of their products were validated against a particular CVE. Those attestations are helpful for automation and triage when they exist, but they are inherently scoped to the inventory the vendor completed.
  • A missing or “page not found” response on a vendor vulnerability page does not necessarily mean the vendor is unaware; it can reflect phased publication workflows or an attestation that’s published elsewhere. When a vendor attestation is published (for example, for a particular Linux image), treat it as authoritative for that vendor product — but do not assume all other images or managed runtimes are covered. Always verify the actual binary or container image you run.

Suggested immediate checklist for operations teams​

  • Inventory (hours)
  • Enumerate all containers, images, and hosts that include PyTorch: pip/conda wheels, system packages, and curated images.
  • Use automated scanning to find torch==2.6.0 in images and artifacts.
  • Verify (same day)
  • Run python -c "import torch; print(torch.version)" on representative hosts and containers.
  • Mitigate (24–72 hours)
  • Disable or block model submission paths that permit execution of torch.lstm_cell in untrusted jobs.
  • Enforce sandboxing and strict per‑job resource limits.
  • Patch (when available)
  • Upgrade to a vendor‑patched wheel or upstream release and rebuild images.
  • Revalidate patched images with the GitHub repro and CI regression tests.
  • Monitor (ongoing)
  • Alert on repeated segfaults, worker churn, or abnormal compilation failures.
Follow this prioritized plan to reduce immediate risk while preparing for a full remediation cycle.

What remains unverified (and how to treat ambiguous claims)​

  • Remote exploitation: public records indicate a local attack vector. There is no authoritative, public proof that CVE‑2025‑3001 alone enables remote code execution across typical cloud deployments. Treat RCE claims as speculative unless demonstrated with a full exploit chain on a real service.
  • Vendor coverage: do not assume any vendor or managed runtime is unaffected simply because a vendor has not yet published an entry; absence of a vendor notice is not evidence of absence. Confirm the binary version in each artifact you run.

Conclusion — a focused, operationally meaningful risk​

CVE‑2025‑3001 is a tangible memory‑corruption bug in a low‑level PyTorch API that produces deterministic crashes under the right inputs. The vulnerability’s real operational risk derives from environments that execute untrusted model code: multi‑tenant platforms, model marketplaces, and shared CI runners are the most exposed. The mitigation path is straightforward in concept — inventory, avoid the vulnerable code, sandbox untrusted workloads, and upgrade to a patched wheel as soon as it is available — but the practical challenge is the long tail of images and vendor artifacts that may continue to carry PyTorch 2.6.0 until rebuilds complete.
Actionable priorities for operators: find every instance of torch==2.6.0 in your estate, block or sandbox untrusted execution paths that can call torch.lstm_cell, and plan an image rebuild and redeploy strategy to consume patched wheels when they are published. If you rely on vendor‑supplied curated images, subscribe to vendor advisories and require explicit confirmation that a curated image includes the patched PyTorch before redeploying it into production.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top