The Linux kernel received a surgical but consequential fix in May 2025 for a memory‑corruption bug in the Broadcom/NetXtreme‑E network driver: bnxt_en: Fix out‑of‑bound memcpy() during ethtool -w — a defect that can produce KFENCE‑detected memory corruption when administrators attempt to retrieve firmware coredumps.
Broadcom’s NetXtreme‑E (driver: bnxt_en) provides Ethernet support for many data‑center NICs. As part of normal diagnostics, administrators use ethtool to collect firmware coredumps from the device (ethtool -w). During that code path the driver asks firmware for a list of coredump segments and then DMA‑reads those segments into kernel buffers. A logic mismatch between the returned DMA length and the driver’s allocated buffer allowed an out‑of‑bounds copy in certain kernel versions — a classic buffer overflow in kernel space. The upstream remedy caps the memcpy length so the driver never copies more data than the buffer holds and slightly adjusts the coredump buffering logic to simplify correctness.
The bug was assigned CVE‑2025‑37911 and cataloged across multiple vulnerability trackers and distribution advisories. Upstream fixes landed in the stable kernel trees and individual distributions issued backports and kernel updates. Distributors that carried the affected trees — including Ubuntu, Debian, SUSE and others — incorporated patches or published advisories mapping the CVE to package updates.
Beyond direct KFENCE traces you can also watch for:
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
Broadcom’s NetXtreme‑E (driver: bnxt_en) provides Ethernet support for many data‑center NICs. As part of normal diagnostics, administrators use ethtool to collect firmware coredumps from the device (ethtool -w). During that code path the driver asks firmware for a list of coredump segments and then DMA‑reads those segments into kernel buffers. A logic mismatch between the returned DMA length and the driver’s allocated buffer allowed an out‑of‑bounds copy in certain kernel versions — a classic buffer overflow in kernel space. The upstream remedy caps the memcpy length so the driver never copies more data than the buffer holds and slightly adjusts the coredump buffering logic to simplify correctness.The bug was assigned CVE‑2025‑37911 and cataloged across multiple vulnerability trackers and distribution advisories. Upstream fixes landed in the stable kernel trees and individual distributions issued backports and kernel updates. Distributors that carried the affected trees — including Ubuntu, Debian, SUSE and others — incorporated patches or published advisories mapping the CVE to package updates.
What exactly went wrong — a technical deep dive
Where the corruption occurs
The failure occurs during the bnxt driver’s coredump retrieval path. The steps are:- The driver requests the firmware’s coredump segment list via the HWRM_DBG_COREDUMP_LIST firmware command.
- The firmware responds with a segment list that is DMA‑mapped into host memory and reports the length of that DMA buffer.
- The driver allocates an in‑kernel buffer (commonly referenced as info->dest_buf) sized according to the number of segments it expects.
- The code then copies the DMA‑backed segment list into info->dest_buf using memcpy without enforcing a hard upper bound based on the actual allocated size.
- If the firmware‑reported DMA length is larger than the allocated buffer, memcpy overruns the destination and corrupts adjacent kernel memory — KFENCE (Kernel Electric Fence) can and did detect this during instrumentation and testing.
Why a simple memcpy caused a kernel‑level problem
In kernel drivers the combination of DMA, firmware‑reported lengths, and userland‑triggered diagnostics is fragile: firmware can return lengths that don’t match driver expectations and DMA buffers are trusted only if the driver rigorously bounds checks all lengths. A memcpy in kernel context has no safe‑guarding: it will happily overwrite memory if passed a bad length. The fix therefore is defensive: compute the buffer length deterministically and cap any copy to that safe maximum. Upstream applied that exact defense: move the buffer‑length adjustment earlier and clamp the copy length so the memcpy cannot overrun the allocated destination.Impact assessment: availability, exploitability, and worst‑case outcomes
Availability impact (DoS risk)
This is chiefly an availability risk: a successful trigger can corrupt kernel memory and cause instability or a kernel panic — a denial‑of‑service (DoS) for the host. Multiple vendors’ advisories classify the practical impact as an availability concern; testing with KFENCE produced memory corruption messages and BUG traces that are consistent with system crashes or persistent degraded behavior. The upstream narrative and distribution advisories emphasize the DoS potential rather than immediate remote code execution.Required privileges and attack surface
Ethtool operations that retrieve firmware coredumps generally require elevated privileges (root or CAP_NET_ADMIN). That limits remote public exploitation but does not eliminate risk:- Local privileged users — or attackers who have already obtained administrative credentials — can trigger the vulnerable path.
- In multi‑tenant or managed hosting environments, a compromised tenant with sufficient privileges inside a VM or container could potentially cause a host or service outage if they can invoke ethtool against a host NIC or an exposed device node.
- Insider threat or misconfiguration scenarios (over‑permissive sudo rules, container escape) expand risk.
Exploitation and control
As of public records around the disclosure, there was no confirmed in‑the‑wild exploitation or public proof‑of‑concept demonstrating reliable privilege escalation or arbitrary code execution from this specific defect. Published analyses focus on memcpy overrun leads to corruption / crash rather than guaranteed code execution. That said, kernel memory corruption defects can be fungible: an attacker who can reliably manipulate the layout and stored values could, in principled terms, craft a more serious exploit. Upstream and distributors therefore treat the issue seriously and assigned it standard CVE tracking.What the upstream fix does
The upstream patch set makes two straightforward changes:- Adjust the coredump buffer handling so the buffer length adjustment used for HWRM_DBG_COREDUMP_RETRIEVE applies earlier in the logic, covering both LIST and RETRIEVE commands uniformly.
- Cap the length passed to memcpy so copying cannot exceed the allocated info->dest_buf length; extraneous DMA data beyond that cap is ignored because it contains no useful information for the coredump consumer.
Distribution and vendor response — where patches landed
Multiple distributor trackers and stable trees recorded the CVE and associated patches. Examples include:- Stable kernel commits and stable tree updates that include the bnxt_en coredump fixes; the patch titles are explicit and were merged into various stable branches.
- Distribution advisories and backport lists (Debian, Ubuntu, SUSE, Red Hat) reference CVE‑2025‑37911 and offer specific package or kernel updates. Several distro trackers included the fix in 5.15‑series backports and newer kernels where bnxt_en maintenance is active.
- Cloud vendor/OS images (Azure Linux, Amazon Linux scanning tools) flagged the advisory and provided remediation guidance or kernel updates where applicable. Note that vendor CVSS scoring and classification sometimes differ between databases; Amazon Linux’s internal analytic sheet labeled the issue “Important” with a CVSS 7.0 in one of their listings, while NVD‑aggregated feeds assigned somewhat different base scores in other listings. That divergence is common and reflects differing scoring contexts.
Detection: what to look for in logs and monitoring
The bug is noisy when KFENCE is enabled: search dmesg and system logs for diagnostic lines similar to:- "BUG: KFENCE: memory corruption in __bnxt_get_coredump"
- Call stacks including __bnxt_get_coredump, ethtool_get_dump_data, __dev_ethtool
Beyond direct KFENCE traces you can also watch for:
- Unexpected kernel panics correlated to ethtool usage.
- Ongoing dmesg churn or repeated NIC resets after attempted coredump retrievals.
- Elevated rate of crash reports or service failures on hosts with Broadcom NICs and affected kernels.
Practical mitigation and patching guidance
Immediate steps for administrators and operators:- Patch promptly: apply the vendor or distribution kernel update that includes the bnxt_en coredump fix. This is the only complete remedy. Check your distribution advisory and kernel version mapping, then schedule the update during a maintenance window.
- Limit access to ethtool: until patched, restrict running ethtool -w and other NIC diagnostic commands to trusted administrators. Enforce least privilege and audit sudoers for any broad allowance of networking commands.
- Control device node access: ensure only authorized users can access /dev entries for network devices or use CAP_NET_ADMIN capabilities, particularly in multi‑tenant settings or where untrusted workloads run.
- Monitor kernel logs: create alerts for KFENCE messages and bnxt_en call stacks — these are reliable indicators the driver path was triggered and may be corrupting memory.
- Consider temporary kernel mitigations: for environments where immediate patching is impossible, consider temporarily blacklisting bnxt_en or using kernel parameters that prevent ethtool from invoking coredump retrievals — but understand that blacklisting a NIC driver will remove network functionality and is disruptive. Always test these options in staging.
- Identify affected hosts and kernels (inventory your fleet by kernel version and loaded modules).
- Schedule and apply upstream or distro kernel updates that contain the patch.
- Harden access to NIC diagnostic operations and monitor logs for indicators described above.
- Where patching is delayed, isolate affected hosts from sensitive workloads or restrict administrative operations that could trigger ethtool coredumps.
Incident response playbook for suspected exploitation
If you observe KFENCE or bnxt_en coredump corruption messages in production:- Immediately gather forensic artifacts: dmesg, journalctl output, kernel oops logs, and timestamps of ethtool invocations.
- Identify the user or process that called ethtool (audit logs can help; enable syscall auditing for ioctl and ethtool operations where appropriate).
- Isolate the host if the corruption appears to cause instability; migrate critical services off until the host is patched.
- If the event coincides with unexpected privilege escalations, consider full forensic analysis because kernel memory corruption can be a precursor to more serious exploitation attempts.
- Remediate by applying the kernel patch and rebooting into the patched kernel; validate the absence of repeated KFENCE traces post‑patch.
Wider lessons for driver and firmware interactions
CVE‑2025‑37911 is instructive beyond the immediate DoS risk. It underscores several broader engineering realities:- Firmware and driver contract boundaries are delicate. Firmware can legitimately report lengths that differ from driver expectations; drivers must treat firmware lengths as untrusted and perform bounds checking on any copy or mapping that follows.
- Diagnostic paths are high‑risk. Features that expose internal state (coredumps, debug interfaces) often combine DMA, user triggers and complex buffering — making them natural places for subtle bugs.
- Small, defensive fixes (move a clamp earlier; cap the copy length) are both effective and low risk. That’s why stable kernels and distributions backported the changes rather than undertaking invasive rewrites.
Discrepancies in scoring and how to prioritize
Vulnerability databases sometimes disagree on base scores and exploitability. For CVE‑2025‑37911, you’ll see different CVSS values reported by different aggregators and vendors — this is not unusual. What matters operationally is:- The required privilege (local CAP_NET_ADMIN/root makes this less likely to be remotely weaponized).
- The impact class (kernel memory corruption → DoS, potential for privilege escalation if paired with other conditions).
- Whether your fleet includes the exact affected driver/kernel combos and Broadcom NIC hardware.
Final analysis — strengths of the fix and residual risks
Strengths:- The upstream fix is minimal, focused, and correctness‑oriented: clamp the memcpy and adjust buffer calculation order. That makes the patch safe to backport and low risk for regressions.
- Distribution and stable kernel maintainers responded promptly, which reduces the window for abuse.
- The defect is more straightforward to detect (KFENCE emits clear diagnostics), so monitoring and triage can be practical and quick to implement.
- The attacker model still includes local privileged accounts; organizations with lax admin controls remain exposed until patched or mitigated.
- Memory corruption bugs are intrinsically dangerous: in theory they can be chained into privilege escalation, especially if an attacker can shape memory layout on a target host. No public POC was known at disclosure, but absence of absence. Flag this with caution and act accordingly.
- Firmware‑driver discrepancies are a recurrent class of issues. Administrators should not assume a single fix removes all bnxt_en risk — continue to track driver and firmware updates from Broadcom and your distro vendor.
Practical checklist for administrators (quick reference)
- Inventory: list hosts with Broadcom NetXtreme‑E NICs and affected kernels.
- Patch: apply the kernel updates that include the bnxt_en ethtool fix during the next maintenance window.
- Restrict: until patched, restrict ethtool and CAP_NET_ADMIN usage to trusted users.
- Monitor: alert on KFENCE/bnxt_en traces and ethtool‑related kernel oops.
- Forensics: collect dmesg, journalctl and audit logs if you suspect exploitation.
- Validate: after patching, reboot into the patched kernel and verify logs show no further KFENCE messages for bnxt_en.
Conclusion
CVE‑2025‑37911 is a concise, real‑world example of how a small mismatch between firmware‑reported DMA lengths and driver buffer allocation can escalate into a kernel memory corruption that threatens system availability. The remedy is equally concise: cap copy lengths and simplify the coredump buffering logic. That pragmatic approach allowed stable kernels and distributors to backport the fix quickly. Administrators should treat the advisory as operationally important: inventory affected hosts, apply vendor updates, and tighten control over diagnostic tools like ethtool until the patch is deployed. Monitoring for the distinctive KFENCE call stack and acting on those alerts will materially reduce the risk of a disruptive outage.Source: MSRC Security Update Guide - Microsoft Security Response Center