A race in the Linux bonding driver's IPsec offload path was closed this year after maintainers fixed a null-pointer dereference in bond_ipsec_offload_ok that could let a local, low‑privilege user crash an affected host — the patch corrects the function’s return type and adds an explicit check for an active slave before dereferencing internal pointers.
Background / Overview
The Linux
bonding driver provides link aggregation and failover for multiple physical interfaces presented as a single logical device. In many environments — cloud images, enterprise appliances, and virtualized hosts — bonding is used to improve resilience and throughput. One advanced capability that interacts with bonding is hardware IPsec offload: certain NICs can perform ESP encryption/decryption in hardware and signal the kernel via the XFRM/xfrmdev offload callbacks that packets have already been processed. The kernel exposes that functionality through the XFRM offload hooks and relies on per‑device callbacks such as xdo_dev_offload_ok() to verify whether a given packet can safely use offload.
The vulnerability tracked as
CVE‑2024‑44990 sits at the intersection of these subsystems: bonding’s IPsec management code and the XFRM offload fast path. The upstream stable patch notes summarize the fix succinctly — maintainers added a guard to confirm that an
active slave exists before dereferencing pointers, and also corrected the function’s return type to the proper boolean.
What went wrong: technical root cause
The offload model and where the race appears
When an xfrm state (IPsec Security Association) is offloaded for a bonded interface, the bonding driver keeps track of offloaded SAs and redirects them to the current
active slave (the real underlying device that is forwarding traffic). To migrate SAs when the active slave changes, the driver runs a three‑step sequence: unoffload from the old slave, flip the active slave pointer, then offload to the new slave. This sequence is correct in principle but must be synchronized carefully with the networking fast path that consults the offload state while transmitting or receiving packets.
The vulnerable window occurs when a fast‑path function asks the device whether it supports offload for a particular packet — a function that expects the offload state to carry a valid pointer to the active slave’s netdevice context. If, due to a timing/race condition during an active‑slave change, that pointer is observed as NULL and the code dereferences it, the kernel will hit a null‑pointer dereference and oops. In other words, the offload fast path assumed a pointer was valid while a concurrent change could make it temporarily NULL. The upstream patch forces a check for the active slave before dereference and ensures the function returns a boolean as expected.
Symptoms you may see
- Kernel oopses or panics with traces mentioning bonding and xfrm/ipsec functions (for example, messages referencing bond_ipsec_add_sa_all or bond_ipsec_offload_ok in the call trace).
- A sudden crash or machine reboot correlated with bonding slave failover or administrative reconfiguration of the bond device.
- Availability loss on hosts that rely on bonding with IPsec offload enabled; the vulnerability's primary impact is high availability (DoS), not confidentiality or integrity.
Affected scope and severity
Multiple vulnerability databases and vendor trackers rate CVE‑2024‑44990 as
Medium in CVSS terms with a base score in the mid‑5 range, reflecting a local attack vector that leads to availability loss rather than data theft. The issue requires local, low‑privileged access (or the ability to trigger bonding state changes), and usually the offending paths require that IPsec hardware offload is both present and in use. That significantly narrows practical attack surface compared with a remote memory‑corruption defect.
Vendor and distribution trackers list this fix across many mainstream kernel trees and downstream advisories — Ubuntu has issued USNs for kernels that include the upstream stable commits, Red Hat mapped the fix into multiple errata, SUSE lists affected kernel packages, and cloud Linux vendors such as Amazon have ALAS entries describing fixed package revisions. If your images use vendor kernels that are older than the fixed stable revisions, you should plan for remediation.
Key practical points about impact:
- Attack complexity is low (race exploitation is non‑trivial, but the trigger surface is local and the required state is relatively reachable on systems configured for bonding + IPsec offload).
- Privileges required are low — a local unprivileged user that can manipulate networking on the host or provoke bond state changes may be sufficient in some setups.
- The primary consequence is availability/DoS (kernel oops -> crash), not confidentiality or code execution.
What the upstream patch changes
The upstream change was twofold:
- A defensive check added to bond_ipsec_offload_ok so the code verifies there is an active slave before dereferencing the pointer used to route offloaded SAs. This prevents the fast path from following a NULL pointer if the bond is mid‑transition.
- A type correction: another stable commit fixed the function’s return type to boolean (the offload-ok functions are expected to return true/false), reducing the risk of subtle logic errors or mismatches with callers. Both commits were authored and signed off by bonding maintainers and merged into the stable trees referenced by downstream vendor patches.
These fixes are small and targeted — that’s typical of null‑pointer defense patches — but they close a race that previously allowed a legitimate code path to access freed or unset state.
Cross‑checking and vendor confirmations
Multiple independent registries and vendor advisories document the same root cause and the same remetable kernel log shows the bonding commit message and the purpose of the change.
- OSV and other CVE aggregators list the CVE and map it to stable commit references and downstream advisories, giving concrete package versions that include the fix.
- Distribution trackers (Red Hat Bugzilla, SUSE security advisories, Amazon ALAS and Ubuntu USNs) list the CVE and provide vendor packages or image revisions that are patched. That confirms the upstream fix was merged and backported across multiple major trees.
For organizations that rely on Microsoft’s published product attestations (for example, Azure Linux inventories), be aware that vendor attestations are product‑scoped; Microsoft’s attestations for Azure Linux and related artifacts are useful where they exist but do not absolve you from verifying every kernel image you run. A number of internal audits and forum posts have discussed Microsoft’s attestation approach when Linux kernel CVEs are shipped in cloud images.
Detection and hunting guidance
Because this vulnerability manifests as a kernel oops, detection is principally log and telemetry driven.
- Search syslog/journalctl for kernel oops traces mentioning bonding, bond_ipsec, xfrm, ipsec, or “making interface the new active one” — those phrases appear in reported traces. Example hunt queries:
- grep -i -E "bond(_|ing)|bond_ipsec|xfrm|ipsec|making interface the new active one" /var/log/journal
- journalctl -k | grep -iE 'bond|ipsec|xfrm'
- Alert on kernel OOPS/PANIC messages correlated with administrative network changes (if your change management system logs interface reassignments or bond reconfigurations, correlate those events).
- For SIEM and EDR: create an analytic rule for kernel message patterns that include “bond0: (slave” plus “bond_ipsec” or xfrm stack traces. Hunting that pattern will surface both attempted exploit attempts and benign operational incidents that hit the race.
If you run live kernel instrumentation (kgdb, crash utilities, perf), capture oops vmcore dumps and search for call traces that include bond_ipsec_add_sa_all, bond_ipsec_offload_ok, or xfrm-related functions; those are the most likely to reveal the problem.
Practical mitigations and immediate workarounds
If you cannot update kernels immediately, these mitigations can reduce risk or buy time. None are as good as installing vendor‑supplied patches, but they are practical stopgaps.
- Disable IPsec hardware offload on NICs used in bonds until patched. Many distributions and network stacks allow toggling the ESP offload feature on a device. For example, verify NIC capabilities with ethtool and disable the specific hardware offload flags if your vendor supports that control. The kernel docs and vendor guides show how ESP offload is expressed via NETIF_F_HW_ESP feature bits; consult your NIC vendor for the precise ethtool or driver commands to turn off ESP offload.
- If the bond is in active‑backup mode and does not require hardware offload, consider temporarily switching to a configuration mode that does not rely on slave‑level offload (or remove the offload flag) during maintenance windows.
- For cloud images: use the vendor’s patched public images or apply vendor-supplied CVE errata. Several cloud vendors and distributions have published fixed kernel package versions and image revisions; prioritize those images for workload migration.
- Constrain local access: because the attack vector is local, tighten access controls to prevent unprivileged users from manipulating network interfaces or triggering bond state changes. Harden container runtimes and isolate administrative tooling to trusted principals.
Remember: these are temporary mitigations. The only comprehensive fix is updating to a kernel that contains the upstream commit or a vendor backport.
Patching: recommended versions and where to get fixes
Upstream and downstream repositories show the fix incorporated into the stable kernel trees, and major vendors mapped the change into their security advisories and errata:
- Upstream stable kernel commit history records the bonding fix and the return‑type correction on the stable branches.
- Distributions with vendor advisories: Red Hat, SUSE, Debian/Ubuntu, Canonical USNs, Amazon Linux ALAS. OSV and distribution advisories map CVE IDs to fixed package versions and image revisions. Use your distribution’s secure update channels to identify the exact kernel package revision you need.
- If you maintain long‑term support (LTS) kernels or custom backports, the upstream stable patch is straightforward and small; kernel maintainers have included it in stable backports that can be merged into LTS trees. The patch set changed only a handful of lines and corrects the offload function’s check and type.
Action checklist for ops teams:
- Inventory all hosts that use bonded interfaces and record whether they have any NICs with ESP (IPsec) hardware offload enabled. Use ethtool -k <iface> to inspect esp-offload capabilities.
- Consult your distribution’s security advisories (USN, RHSA, SUSE-SU, ALAS, etc.) and identify kernel package updates that list CVE‑2024‑44990 as resolved.
- Schedule patch windows and plan image rollouts for cloud VM fleets; prefer vendor‑supplied kernel images that include the backport.
- If immediate patching is impossible, disable IPsec offload or restrict ability to flip bond slaves until patched.
Threat modeling: who should worry most
This vulnerability is primarily an availability concern, so organizations that need high uptime and run network configurations that combine bonding with IPsec hardware offload should treat the fix as higher priority:
- Cloud providers and tenants who use bonding to present multiple NICs to VMs and rely on hardware IPsec acceleration for VPN termination.
- Network appliances and routers built on Linux where bonding is used for redundancy and offload is used to improve throughput.
- High‑availability clusters where a kernel oops on a single node can cascade to failovers, service disruptions, or automated scaling jitter.
Organizations that do not use bonding, do not enable IPsec offload, or run kernels that vendors have already patched for this CVE are at minimal risk from this specific defect.
Why the fix matters beyond this single CVE
Null‑pointer dereferences and race conditions in networking code are a recurring class of operational hazards: they typically cause deterministic, reproducible host failures and can be triggered by benign administrative actions (link flaps, interface reconfigurations). The bonding driver’s IPsec offload code represents a complex interaction between kernel networking state and device hardware capabilities; small, timing‑sensitive assumptions — like “real_dev is always set” — can break when asynchronous events occur. Adding explicit checks and correcting type contracts (e.g., a function that should return bool) reduces the surface for future regressions and makes reasoning about the code safer for maintainers and reviewers. The community’s rapid backporting of this fix into stable trees and vendor kernels illustrates that small, well‑targeted patches can eliminate classable crash primitives without large architectural rewrites.
Final recommendations — prioritization and timeline
- Prioritize patching for hosts that match both of these conditions: (a) use Linux bonding and (b) rely on IPsec hardware offload (NETIF_F_HW_ESP). If both are true, classify the host as high priority for updates.
- For cloud fleets, prefer vendor images and kernel packages that list CVE‑2024‑44990 as resolved; distribution advisories give the exact package revisions to deploy.
- If you cannot immediately patch, disable ESP/crypto offload on bonded slaves or restrict local interface management to a small set of trusted administrators until the fix can be applied.
- Add a detection rule in your logging platform to alert on OOPS traces referencing bonding/ipsec/xfrm and investigate any matches promptly.
Conclusion
CVE‑2024‑44990 is a textbook example of how small races at subsystem boundaries — here between the bonding driver and the XFRM IPsec offload fast path — can produce outsized operational impact. The fix is simple, clear, and widely backported: check that an active slave exists before dereferencing internal pointers, and ensure the helper function returns the correct boolean type. For organizations that enable IPsec offload on bonded interfaces, the path to safety is straightforward: inventory, prioritize, and apply vendor kernel updates or disable the offload until patched. Multiple upstream and downstream sources document the issue and the remedy, and vendors across the ecosystem have published advisories and backports to close this availability surface.
Source: MSRC
Security Update Guide - Microsoft Security Response Center