Linux Kernel BPF Redirect skb Scrubbing Fix CVE-2025-37959

  • Thread Author
Linux kernel networking diagram showing BPF, xfrm, timestamps, skb, and network namespaces.
The Linux kernel received a targeted fix for CVE-2025-37959 — a BPF-related packet-scrubbing bug that could cause *socket buffer (skb) metadata from one network namespace to be preserved and misapplied after a bpf_redirect_peer redirection — a behavior which broke container networking (notably Cilium-managed clusters) and created a moderate integrity/availability risk for certain datapath and encryption scenarios.

Background​

The vulnerability centers on the interaction between the Linux Berkeley Packet Filter (BPF) datapath helper bpf_redirect_peer and the kernel representation of packets, the skb (socket buffer). When a BPF program uses bpf_redirect_peer to forward a packet to a peer device that lives in a different network namespace (netns), the kernel previously failed to scrub the skb properly. As a result, kernel internal skb extensions — metadata attached to the packet such as XFRM (IPsec) state and other per-packet extensions — could be carried across the namespace boundary and seen by the receiving namespace.
This mismatch matters because netns boundaries are a primary kernel mechanism for isolating network stacks for containers and other lightweight virtualization constructs. Scrubbing is the kernel’s usual safeguard when a packet is handed off between namespaces or through virtual Ethernet (veth) pairs; it removes or resets per-namespace state so the receiving stack processes a clean packet. The reported bug left more of that state intact than it should, except for a couple of fields that were intentionally preserved.
Multiple distribution and vulnerability tracking entries described the issue and the upstream kernel fix: maintainers altered the skb handling path used by bpf_redirect_peer so that the skb is scrubbed in a way similar to a standard netns switch (with the notable exception that skb->mark and skb->tstamp were not zeroed). The fix was accepted into stable kernel trees and rolled into vendor security updates.

Overview: what CVE-2025-37959 actually does​

  • Vulnerability vector: local BPF program using bpf_redirect_peer to redirect packets between devices belonging to different network namespaces.
  • Root cause: the skb was not scrubbed on netns-targeted bpf_redirect_peer redirects; skb extensions (for example XFRM state) were preserved.
  • Observable symptom: packets decrypted by the host’s IPsec/XFRM stack carried host-side XFRM state into the container netns, causing the container-side XFRM policy checks to mismatch and drop packets unexpectedly.
  • Practical impact: availability/regression of container networking (dropped packets, failed connections) and potential cross-namespace leakage of skb-internal metadata (integrity/confidentiality of internal kernel state — though not an arbitrary data exfiltration vulnerability as typically defined).
  • Severity tier: generally assessed as moderate because exploitation requires local control of BPF programs or involvement of BPF-enabled datapath components and because the symptom is primarily broken behavior (packet drops) rather than guaranteed data disclosure.

Technical deep-dive​

What is bpf_redirect_peer?​

bpf_redirect_peer is a BPF helper that directs the kernel datapath to redirect a packet to another network device that is a peer to the calling device — often used with veth-like topologies or to implement optimized forwarding for container endpoints. The helper is powerful because it lets an in-kernel filter or program reroute packets at line rate without needing to exit to user space.

The skb and skb extensions​

The kernel’s skb is the central structure representing a network packet in the Linux networking stack. Over years of development it has accumulated a number of extensions — auxiliary fields and side-data areas that store off-path state such as:
  • XFRM (IPsec) session/state used for decryption/re-encryption
  • GRO/GSO/GRO-related bookkeeping
  • skb->mark (used by iptables/conntrack/routing policies)
  • timestamp fields (skb->tstamp)
  • security or netfilter metadata
  • per-packet control blocks (skb->cb)
When a packet transitions between network namespaces via a real device (e.g., veth), the kernel typically calls scrub functions to reset or zero fields that should not survive the switch. That keeps the receiving namespace’s policy and state evaluation correct.

What went wrong​

When a packet was redirected using bpf_redirect_peer to a device that belonged to a different netns, the code path failed to perform the same level of scrubbing that a standard netns switch performs. Crucially, the XFRM state (the metadata set when an IPsec packet is decrypted) remained attached to the skb and traveled into the container netns. When the packet reached the receiving namespace, the XFRM policy checks used the leftover host-side XFRM state and — because the container had no matching policy — the kernel dropped the packet. The debugging traces published by multiple advisories showed the number of active skb extensions remained nonzero across the redirect and into the container-side XFRM checks, producing a kfree_skb_reason value showing drops attributable to mismatched XFRM policy evaluation.

The fix​

Upstream kernel developers applied a patch that explicitly scrubs the skb during the bpf_redirect_peer netns redirection path. The scrub mirrors what veth-based netns transitions already did: clear the relevant skb extensions so the receiving namespace sees a sanitized packet. The patch intentionally leaves skb->mark and skb->tstamp untouched. That decision preserves certain use cases relying on packet marking and timestamps during redirection, but it is important for operators to be aware those two fields are preserved.

Who is affected and which kernel versions are patched​

The issue affects kernels that included the bpf_redirect_peer path behavior prior to the upstream patch. Vendor and distribution advisories indicate the problem was identified and fixed in kernel stable release branches; the patch was backported to several lines and delivered as updates.
  • The CVE was disclosed in May; vendors incorporated the fix into stable/release branch kernels and issued package updates.
  • Fixed kernel versions shipped to distributions include backports into 6.x stable lines and vendor-specific builds. At the distribution level, fixes appear in 6.1 series stable updates and later stable branches after the patch was merged.
  • Some enterprise and cloud distributions applied vendor-specific versioned updates (for example, vendor kernel packages built on 6.1.x branches were updated to include the patch).
Because packaging and backport policies differ across distributions, administrators must check their distribution’s security advisories and kernel package changelogs to confirm presence of the fix. If a distribution backport is not yet available, running a kernel from upstream stable that contains the patch or applying the vendor-provided update are the recommended routes.
Caveat: specific package names, release numbers, and update identifiers vary between distributions and cloud vendors; operators should consult their vendor advisories for precise version numbers applicable to their systems.

Real-world impact: why Cilium users noticed this first​

The initial problem report demonstrated a practical failure mode observed in Cilium, the popular eBPF-based CNI (container network interface) and network datapath. Cilium relies heavily on BPF programs and performs various redirections between host and container netns. In setups where packets were decrypted by host IPsec (XFRM) and then forwarded into container namespaces via bpf_redirect_peer, the preserved XFRM state caused container-side policy evaluation to fail and drop traffic. This manifested as interrupted flows, failed DNS requests, or other application-layer failures inside containers — a clear availability/functional impact that affected operational clusters and drew attention.
Cilium maintainers and users already track a number of edge-case interactions between IPsec/WireGuard and the datapath; this bug exemplified one such interaction where kernel behavior diverged from the expected netns-switch semantics and broke forwarding logic in higher-level datapath tools.

Attack surface and risk analysis​

Attack prerequisites​

  1. Local or delegated control of a BPF program or an environment that installs BPF programs that use bpf_redirect_peer. This typically means administrative or container-runtime privileges to propagate BPF code, or exploiting software that accepts user-supplied BPF programs.
  2. Topology with netns cross-redirect — the redirection must target a device in another network namespace.
This vulnerability is not a remote unauthenticated exploit; it requires code execution in the kernel datapath or privileged operations to install BPF programs. For that reason, the exploitability is constrained compared with remote code execution CVEs.

Potential impacts​

  • Availability: packet drops and broken connectivity for containers or virtualized network stacks when bpf_redirect_peer is used with cross-netns targets, especially in IPsec or encrypted stacks.
  • Integrity/Confidentiality of kernel state: internal skb metadata being preserved across netns boundaries can expose kernel-internal state to a different namespace context. While this does not automatically translate to arbitrary kernel memory disclosure, it is a violation of isolation expectations and could have unexpected side effects in complex stacks.
  • Operational surprise: systems and CNIs that assume netns boundaries clean state may fail in subtle ways, complicating debugging and incident response.

Exploitation likelihood​

Public vulnerability databases and distribution advisories classify the issue as moderate, and empirical risk indicators (such as EPSS-style probability scores reported by third-party trackers) suggested a low short-term exploitation probability — largely because exploitation requires local ability to install or control BPF programs and because the immediate impact observed was packet loss rather than direct host compromise.
Nevertheless, the problem had an immediate availability impact for containerized workloads that use the affected redirection pattern; operators running Cilium or other BPF-driven CNIs should consider that severity from an operational standpoint, even if the security escalation risk is low.

Mitigation and recommended actions​

Operators and administrators should treat this issue as a kernel-level datapath bug that should be corrected by updating the kernel. Suggested actions:
  1. Patch the kernel — apply the vendor-distributed security update that contains the bpf_redirect_peer skb-scrubbing patch. This is the authoritative and recommended fix.
  2. If an immediate vendor kernel update is not available:
    • Temporarily avoid BPF programs or CNI configurations that call bpf_redirect_peer across netns boundaries where practicable.
    • If using Cilium or similar CNIs, consult upstream and vendor guidance for Cilium versions or configuration flags that work around the problem (for example, avoiding netns-cross redirects that depend on bpf_redirect_peer until kernels are updated).
  3. Audit BPF program installs — ensure that only trusted BPF programs are loaded, and limit BPF installation privileges to trusted operators and runtimes.
  4. Monitor cluster networking behavior — packet drops, unusually high XFRM policy rejections, or unexpected kfree_skb_reason counters can indicate the issue in environments that have not yet been patched.
  5. Plan scheduled maintenance — because the fix is kernel-level, patch windows and reboots may be required; plan for maintenance to roll out kernel updates across hosts in a controlled fashion.

Workarounds and kernel backport guidance​

  • Many distributions backported the fix into long-term stable kernel series. If operators use a distribution kernel branch that has not yet received the backport, they can either:
    • Install the upstream stable kernel release that includes the patch, or
    • Apply vendor-supplied kernel packages that include the security backport.
  • For environments that cannot reboot quickly, consider moving sensitive workloads to hosts already patched, or defer operations that rely on bpf_redirect_peer until a maintenance window.
Note: Some operators may consider compensating config changes inside Cilium (or other CNIs) to avoid the redirection pattern; while possible in the short term, the long-term fix remains a kernel-level update.

What this patch means for BPF developers and network engineers​

  • BPF programs that rely on redirect helpers must account for the fact that the kernel will now scrub skb internal state during certain cross-netns redirects. This brings bpf_redirect_peer semantics closer to conventional netns switching via veth devices.
  • The deliberate preservation of skb->mark and skb->tstamp in the applied fix can be relevant for metrics, QoS, and debugging workflows — developers who used these fields across netns boundaries should validate intended behavior post-patch.
  • The patch sets a precedent: netns boundary crossing should be treated as an explicit sanitization point, and BPF programs should not rely on host-internal skb extension state being present in the target namespace.

Critical evaluation: strengths and possible weaknesses of the fix​

Strengths​

  • The fix addresses the root cause by bringing bpf_redirect_peer behavior in line with existing netns-switch semantics, thereby eliminating surprising cross-namespace metadata leakage.
  • It resolves a real, observable operational failure — specifically, packet drops caused by leftover XFRM state — which improves reliability for containerized workloads that use IPsec or similar features.
  • The change is narrowly scoped to the redirection path and did not broadly alter unrelated BPF behavior, reducing the surface area for regressions.

Potential risks and trade-offs​

  • The patch intentionally preserves skb->mark and skb->tstamp. Preserving these fields avoids breaking some existing use cases, but it leaves the door open for subtle interactions if those fields were being used as a vehicle for cross-namespace signaling — an uncommon but possible pattern.
  • Kernel patches that alter packet-handling behavior in the datapath are always candidates for regressions in high-throughput or pathologically complex setups; operators should test the updated kernel under representative workloads.
  • The change assumes that scrubbing the subset chosen is the correct balance between isolation and compatibility. If future use cases require preservation of additional fields, further refinement or API-level changes may be necessary.

How to detect whether you were affected​

  • Check kernel package changelogs and the presence of the security update in your vendor repository.
  • Inspect network monitoring and kernel counters for unusual kfree_skb_reason events indicating XFRM or policy-based drops.
  • If running Cilium or a similar BPF CNI: verify whether you observed sudden pod-to-pod connectivity failures, DNS failures, or intermittent drops in environments where host-side IPsec decryption occurs before forwarding to containers.
  • Review BPF program logs and any metrics that indicate use of bpf_redirect_peer or redirection paths across netns.

Longer-term implications for container networking and BPF​

This CVE is a clear reminder that as kernel subsystems grow more interconnected (BPF, XFRM/IPsec, namespaces), subtle state-handling bugs can manifest as operational outages in layered systems such as Kubernetes. The BPF ecosystem’s power (in-kernel programmability at scale) increases the importance of carefully reasoning about how kernel metadata flows across isolation boundaries.
The incident reinforces several long-term best practices:
  • Keep kernel and CNI components updated; kernel-level fixes are often the only full resolution for datapath bugs.
  • Treat BPF program installation and privilege as an attack surface; constrain who can install BPF code.
  • Drive observability into kernel datapath layers so operators can link packet-drop symptoms to underlying causes quickly.

Practical checklist for administrators (actionable items)​

  1. Inventory hosts that run BPF-enabled CNIs (Cilium, etc. and list their kernel versions.
  2. Consult your distribution or cloud vendor security advisory page to confirm whether the kernel package includes the bpf_redirect_peer skb-scrub patch.
  3. Schedule kernel updates for affected hosts and plan for restarts.
  4. For clusters using IPsec (host-side decryption) + BPF redirection into container namespaces, test connectivity pre- and post-update in a staging environment before mass rollout.
  5. Audit BPF program sources and limit BPF installation privileges to trusted operators.
  6. After patching, validate that previously observed drops (XFRM-in-no-pols or similar counters) have returned to baseline.

Conclusion​

CVE-2025-37959 represents a targeted, moderate-risk datapath bug arising from inconsistent skb scrubbing when using bpf_redirect_peer to cross netns boundaries. The real-world symptom—container-side packet drops in IPsec scenarios—made this an important operational issue for BPF-based CNIs like Cilium. The upstream kernel patch restores expected netns boundary semantics by scrubbing skb extensions during such redirects, and vendors have backported the fix into stable kernel trees.
For operators, the fix is straightforward in concept but operationally significant: deploy the patched kernel and validate container networking. For developers and BPF users, the event underscores the importance of treating netns boundaries as security and correctness boundaries and expecting the kernel to sanitize internal packet metadata when crossing them. The recommended response is simple: prioritize kernel updates for affected hosts, review BPF program deployment policies, and validate CNI behavior in a controlled environment after applying the fix.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top