CVE-2025-64329: Patch containerd CRI Attach Goroutine Leak DoS

  • Thread Author
A newly disclosed vulnerability in the containerd CRI server — tracked as CVE-2025-64329 — allows repeated use of the CRI Attach feature to leak goroutines and steadily increase the containerd process’s memory footprint until the host’s memory is exhausted. The issue, reported to the containerd team and fixed in short-cycle releases, poses a practical denial-of-service risk for clusters and hosts that expose Attach functionality to untrusted users. Operators using vulnerable containerd versions should prioritize patching to the fixed releases and apply short-term mitigations where immediate upgrades are not possible.

Background​

What is containerd and why this matters​

containerd is the widely used, CNCF-hosted container runtime that underpins much of the modern container ecosystem. It implements the Container Runtime Interface (CRI) used by Kubernetes and serves as a critical runtime layer for many container hosting solutions and tooling stacks — including Docker Engine integrations, Kubernetes node runtimes, and many container platforms.
Because containerd runs as a privileged system daemon on compute hosts, any defect that causes sustained resource consumption in containerd can escalate quickly to a host-level outage. The Attach RPC in the CRI allows a user to attach a client (for example, via kubectl attach) to the stdio streams of a container. That functionality is convenient for developers and operators, but it requires careful access control to avoid abuse.

Goroutines and memory in Go services​

containerd is implemented in Go, which uses goroutines for concurrent operations. Goroutine leaks occur when goroutines are spawned but never terminate or are blocked waiting on resources that never arrive; over time they accumulate stack and heap memory, raising the overall memory consumption of the process. A steadily growing goroutine set is a classic root cause for memory exhaustion in long-running Go services.

What the vulnerability does (technical summary)​

CVE-2025-64329 is a logic/cleanup bug in containerd’s CRI Attach implementation. Under sustained, repetitive Attach calls, the server spawns goroutines that are not correctly closed or garbage-collected, which leads to an increasing goroutine count and growing resident memory usage in the containerd process. The practical result is that an attacker (or an unprivileged user with the ability to call Attach repeatedly) can consume large amounts of memory on the host, causing degraded node performance or full node failure.
Key technical points:
  • The defect lies in the CRI Attach code path: the RPC used to bind a client’s stdio to a running container. Repeated Attach operations can leak goroutines.
  • The impact is primarily availability — the leak consumes memory on the host, producing a denial-of-service (DoS) condition if left unchecked.
  • The vulnerability does not modify container images, steal secrets, or directly escalate confidentiality or integrity risks; it is an availability risk with potentially disruptive downstream effects for running clusters.

Affected versions and remediation​

The containerd project produced short-term releases to address the leak. Operators should consider these the canonical remediation path.
Patched containerd versions (minimum safe versions):
  • 1.7.29
  • 2.0.7
  • 2.1.5
  • 2.2.0
If your environment is running a containerd release older than those listed above, it is vulnerable and should be updated. Note that downstream distributions and appliance vendors may release their own backported patches or advisories — check your platform vendor for packaged updates.

Exploitability and severity — clarifying the risk​

This vulnerability is classified as a medium-severity availability issue in the containerd advisory. Important operational facts about exploitability:
  • Attack Type: Local or privileged API-driven exhaustion. The Attach RPC can be invoked by users who have access to the Kubernetes API with payloads that request Attach, e.g., via kubectl attach or other CRI clients.
  • Privileges: There is inconsistency in public severity metadata reported by different sources about whether special privileges are required to trigger the leak. Some assessments classify the attack vector as “local/no privilege,” while others indicate that the attacker needs permission to call Attach (which is typically controlled by Kubernetes RBAC).
  • User Interaction: Not required once the attacker can issue Attach RPCs programmatically; repeated/automated Attach calls are sufficient.
  • Impact: Availability — destructive memory growth can lead to host instability, kubelet failure, eviction of pods, or node crash.
Because third-party sources differ on the exact privilege requirement, the conservative and practical stance for ops teams is to assume that the vulnerability is exploitable by any principal that can issue Attach requests to the CRI (for example, users who can run kubectl attach or have equivalent API access). That means RBAC misconfigurations or overly permissive admission policies can make this easy to exploit.

How to detect if you’re being affected (indicators)​

A few operational signals point to this problem in production:
  • Rising memory usage of the containerd process (resident set size) that correlates with Attach traffic spikes.
  • Increasing goroutine count in the containerd process over time.
  • Repeated Attach or exec operations in audit logs proximate to the memory growth.
  • Containerd crash loops or node-level OOM (Out-Of-Memory) events, often accompanied by kubelet or systemd logs showing memory pressure.
  • If containerd exposes debug or pprof endpoints in your environment (only if deliberately enabled), collecting a goroutine or heap profile will show stacks that indicate stuck goroutines in the Attach code path.
Practical checks:
  • Check containerd version:
  • containerd: run containerd --version or ctr version or check your distro package manager to confirm installed package version.
  • Monitor memory and goroutine counts:
  • Observe the containerd process via top/ps or systemd cgroups; watch RES/RSS across time.
  • If pprof or debug endpoints are available (e.g., /debug/pprof), request the goroutine profile to see leak candidates (do not enable pprof in exposed production networks; perform this in secured admin contexts).
  • Inspect Kubernetes audit logs for frequent Attach/Exec requests from a particular principal or set of principals.

Short-term mitigations (while you patch)​

If you cannot upgrade immediately, several mitigations reduce the attack surface or limit impact:
  • Restrict Attach permissions with RBAC: Remove or limit the cluster roles that permit exec/attach if those permissions are not required for broad sets of users or service accounts. Prefer narrowly scoped RBAC policies.
  • Admission controller to deny Attach: Implement an admission policy (ValidatingAdmissionWebhook, OPA/Gatekeeper, Kyverno) that denies or controls access to pods/attach operations for untrusted namespaces or users. This is the workaround suggested by the containerd advisory.
  • Limit containerd process memory with systemd:
  • Use systemd’s MemoryMax setting for the containerd.service unit to put an upper bound on memory consumption (keeping in mind that hard caps can cause process OOMs rather than graceful degradation).
  • Increase observability & alerting:
  • Add an alert for anomalous memory growth in containerd or for steady goroutine count increases if you can expose metrics.
  • Throttle Attach operations at the API layer:
  • Place quota or rate-limiting rules around the Kubernetes API server for exec/attach-type endpoints where your API gateway or management tooling supports it.
  • Revoke or rotate credentials for suspicious principals:
  • If you spot repeated Attach abuse from a service account or user, temporarily revoke access until you can fully investigate and patch.
These mitigations are intended to reduce exposure and to buy time to apply the proper updates. They are not substitutes for installing the patched containerd release.

Patching and operational rollout guidance (recommended sequence)​

Follow a controlled, auditable update process:
  • Inventory:
  • Identify all nodes and hosts running containerd. Include Docker Desktop instances, Linux hosts, VM scale sets, and managed node pools.
  • Verify versions: containerd --version, ctr version, docker info (for Desktop/Engine integrations), and package manager status (apt/yum/zypper etc..
  • Check vendor packages:
  • For packaged containerd (distribution-provided), check vendor advisories and install distribution-supplied security updates when available.
  • Schedule rolling upgrades:
  • Patch worker nodes in waves, draining workloads and ensuring PodDisruptionBudgets and cluster capacity are respected.
  • Update container runtimes:
  • Upgrade to one of the fixed releases: 1.7.29, 2.0.7, 2.1.5, or 2.2.0 (or later).
  • For servers running Docker Desktop or other bundled runtimes, update the product to a vendor patch release that contains the fixed containerd build.
  • Post-upgrade verification:
  • Confirm containerd restarts cleanly, perform smoke tests (pull/run containers), and monitor memory/goroutine metrics for stability.
  • For managed clusters (EKS/AKS/GKE):
  • Check your cloud provider’s advisory for when managed nodes are patched. If you manage your own node pools, follow the same rollout guidance above.

Detect, investigate and recover: practical commands and checks​

  • Check containerd version:
  • containerd --version
  • ctr version
  • docker info | grep -i containerd
  • Inspect process memory:
  • ps aux --sort=-rss | grep containerd
  • systemctl status containerd.service
  • Collect goroutine/heap profiles (only in a secured admin context):
  • Access pprof endpoints if enabled: curl http://127.0.0.1:<pprof-port>/debug/pprof/goroutine
  • Use go tool pprof or analyze the stack dumps to identify the code path accumulating goroutines.
  • Restarting containerd as a short-term recovery:
  • systemctl restart containerd.service
  • Note: a restart reclaims memory but is not a fix; if Attach calls resume, the leak will reappear until patched.

Wider ecosystem implications (Docker Desktop, Kubernetes, and cloud services)​

  • Docker Desktop: Docker Desktop may embed or interact with containerd components; users of Desktop should confirm whether the ship includes a vulnerable containerd and apply Desktop updates from the vendor.
  • Kubernetes: Any Kubernetes cluster that uses containerd as the node runtime (common across many distros and vendors) should be considered in-scope. The Kubernetes API and RBAC model means that misuse can originate from principals with exec/attach rights; tighten those controls.
  • Managed Kubernetes: Managed services (AKS, EKS, GKE) typically roll out platform patches; administrators should check their provider advisories and node pool images for the restart/patch schedule.
  • Linux distributions: Many Linux distros and appliance vendors will ship containerd as a packaged binary; operators should prioritize vendor patches in OS updates.

Why this is important for Windows and Windows-focused admins​

While containerd is primarily a Linux runtime, Windows ecosystem impacts are not negligible:
  • Docker Desktop on Windows can include containerd components (for example, when using containerd as the image store or where Docker Engine integrations exist). Windows developers using Docker Desktop should confirm their Desktop version and vendor advisories.
  • Teams operating hybrid clusters (Linux worker nodes, Windows nodes, or Windows developers using WSL2) can be indirectly affected if a vulnerable node drains cluster capacity or causes outages in shared control-plane services.
  • Windows Server and container tooling teams that coordinate with Linux node teams need to factor this into incident response, because node-level memory exhaustion on Linux workers can impact Windows workloads scheduled in the same cluster control plane.

Long-term fixes and recommendations (beyond the patch)​

  • Enforce least privilege: Ensure RBAC rules are minimal; avoid granting broad exec/attach permissions to service accounts and user roles.
  • Harden the API: Use admission controllers and policy engines to centrally manage and enforce attach/exec policies across the cluster.
  • Resource governance for control-plane daemons: Consider cgroup limits and systemd resource policies to prevent a single process from consuming all host memory.
  • Observability: Export Go runtime metrics and monitor goroutine counts, heap allocations, and non-GC memory metrics for containerd or other critical Go daemons. Create alerts for anomalous trends rather than one-off thresholds.
  • Test in staging: Test upgrades and patch rollouts in staging environments that mirror production cluster scaling and attach/exec patterns to catch regressions early.
  • Disable debug endpoints by default: Ensure pprof or other debug endpoints are not exposed to untrusted networks; enable them only for secured admin investigations.
  • Upstream contribution and QA: For teams contributing to containerd-based code, adopt stricter resource lifecycle review and goroutine lifecycle checks during code review and testing.

Critical analysis — strengths, weaknesses, and risk posture​

Strengths:
  • The containerd project produced a quick advisory and patch releases across supported minor tracks, which gives administrators clear upgrade targets.
  • The vulnerability’s primary impact is confined to availability; it does not indicate remote code execution or data exfiltration, limiting the worst-case confidentiality/integrity impact.
Weaknesses and concerns:
  • The vulnerability highlights the risk of runtime leaks in system-level Go services. Goroutine lifecycle mistakes are non-trivial bugs that can have outsized operational ramifications, and they often emerge only under sustained or stress testing.
  • Inconsistent metadata from differing advisories about privilege requirements and CVSS vectors creates uncertainty for operators. That inconsistency increases the chance of under- or over-reacting: different sources classify the required privileges differently, so teams should assume the more restrictive posture (i.e., require that Attach rights be treated as sensitive).
  • Many real-world Kubernetes environments have broad RBAC roles or permissive admission policies that allow developers or CI systems to use exec/attach freely. That makes this class of vulnerability easier to exploit internally.
Risk posture recommendation:
  • Treat CVE-2025-64329 as an operational availability emergency for clusters where Attach is widely accessible or in environments with many developer accounts or automated systems that use exec/attach. Patch as soon as practical; where patches cannot be applied immediately, apply the mitigation layers described above.

Practical playbook (concise 1–2–3 remediation)​

  • Inventory & verify:
  • Identify all hosts and deployments running containerd and confirm versions.
  • Patch:
  • Upgrade to containerd 1.7.29 / 2.0.7 / 2.1.5 / 2.2.0 or later as appropriate. Use vendor-supplied packages when possible.
  • Harden & monitor:
  • Restrict Attach permissions via RBAC/admission control, implement systemd MemoryMax, enable observability for memory and goroutine metrics, and alert on deviations.

This vulnerability is a timely reminder that even mature, widely used runtime components can harbor resource-leak bugs with outsized operational impact. The combination of a clear remediation path, practical mitigations, and well-understood detection methods makes this an incident that can be managed with good patch hygiene and principled access controls. Operators should treat containerd upgrades as high priority and harden Attach/exec access in their clusters to prevent repeat incidents of this kind.

Source: MSRC Security Update Guide - Microsoft Security Response Center