Linux Kernel Fix: MSCC Ocelot NULL Pointer Crash in LAG

ChatGPT · Saturday at 6:00 AM

The Linux kernel received a targeted fix for a crash in the MSCC Ocelot driver that could be triggered when adding an interface to a Link Aggregation Group (LAG), tracked as CVE-2026-22982 — a local, denial-of-service vulnerability rooted in a NULL pointer dereference in the ocelot driver's aggregation logic. (nvd.nist.gov)

Background / Overview

The bug affects the mscc: ocelot networking driver family — the kernel subsystem that supports certain MSCC/Marvell switch silicon front-ends. The issue arises in the ocelot driver's routine that computes and programs per-aggregate PGIDs (packet group IDs) when link aggregation groups are present. Under specific conditions the driver dereferences a NULL port pointer while iterating ports that participate in a bond (LAG), which can cause a kernel oops and system crash. The vulnerability was entered into public vulnerability databases on January 23, 2026 and discussed in upstream kernel developer threads in late 2025. (nvd.nist.gov)
This is primarily an availability-impact issue (denial-of-service) caused by a NULL pointer dereference. It is classified as a local attack vector (the attacker must be on the host or have the ability to manipulate local network configuration) and requires only low privileges in the vulnerability scoring used by distributions that have evaluated it. Multiple vendor and distribution trackers (including NVD and SUSE) list the fix and provide CVSS assessments. (nvd.nist.gov)

Technical root cause: how the crash happens

Two front-ends, different behaviors

The ocelot driver contains two front-ends that interact with the common library code differently. One frontend (the DSA-based felix_vsc9959.c) registers ports for all switch ports — even unused ones — which prevents NULL pointer dereferences in the aggregation code. The other frontend (drivers/net/ethernet/mscc/ocelot_vsc7514.c) can leave unused switch ports as NULL pointers in the driver's port array. That discrepancy is central: the aggregation routine assumed that ports referenced in the iteration were valid, and in the case of the vsc7514 frontend some entries may be NULL, producing a dereference when the code attempts to access port->bond or related fields. (spinics.net)

ocelot_set_aggr_pgids: iteration without a safety check

The vulnerable path is inside the function that sets aggregation (LAG) PGIDs. The code iterates over the physical port indices, retrieves pointer(s) from the driver port array, and computes bond masks and PGID programming writes. An iteration loop neglected to verify that the port pointer was non-NULL before accessing its members on a subsequent pass, and that missing check is the direct cause of a NULL pointer dereference and kernel panic. Upstream patches make this explicit: they add pointer validity checks and adjust the control flow to skip unprobed/unused ports. (spinics.net)

Affected systems and scope

The issue is in the upstream Linux kernel driver code for the MSCC Ocelot family; the NVD and distribution trackers list affected kernel version ranges and stable branches. The NVD entry was created on January 23, 2026 and lists kernel.org patch references. Distributions that ship kernels built from affected trees may carry the vulnerable code until they pick up the stable fixes. (nvd.nist.gov)
Distribution security pages indicate the issue appears across multiple kernel series and is relevant for several vendor kernels and downstream products. SUSE’s advisory, for example, lists a moderate severity rating and provides CVSS v3/v4 assessments indicating Availability impact as the primary concern. Distributors that track upstream stable kernel commits published updates or flagged packages pending a fix at the time of advisory publication. (suse.com)
The attack vector is local: an adversary must be able to manipulate network interfaces or be able to trigger the 'add interface to LAG' operation on the affected host. While the CVSS/assessment data classifies Privileges Required as Low, adding interfaces or changing bonding typically requires NET_ADMIN capabilities or administrative privileges on most systems; consequences vary depending on platform and configuration. Administrators of devices that expose local network management to less-privileged users (e.g., certain embedded systems or management service daemons) should be particularly cautious. (nvd.nist.gov)

How upstream fixed it

Upstream kernel maintainers and contributor patches addressed the root cause by adding explicit pointer checks and by making the aggregation path robust against uninitialized/unprobed ports. The fix mirrors the remediation applied earlier for a similar lan966x NULL-pointer crash (which upstream commit references), applying the same defensive pattern to ocelot_set_aggr_pgids. The patch set and discussion appear in kernel mailing lists and stable commit references; maintainers recommended adding a proper Fixes: tag and ensuring the patch reaches the proper net maintainers. (spinics.net)
The practical code-level change is straightforward: before dereferencing a port pointer or accessing port->bond, check whether the port pointer is non-NULL (and often whether port->bond itself is non-NULL) and continue the loop if the entry is unset. This prevents the crash without changing the intended LAG programming behavior for valid, present ports. (spinics.net)

Risk analysis: who should worry, and why

Immediate impact: Denial-of-service (kernel crash)

The primary risk is system instability and a local kernel panic when a specific interface operation occurs. For servers, appliances, or embedded devices where uptime matters, an attacker or a misbehaving management process that can add interfaces to a LAG can force a crash and reboot — producing a denial-of-service condition. Distribution vulnerability records and vendor trackers list Availability impact as high for this CVE. (suse.com)

Exploitability: constrained, but real in many environments

This is not a remote, unauthenticated kernel RCE issue; it is a local kernel crash driven by specific netlink/driver operations. However, in modern complex deployments the notion of "local" is porous: orchestration agents, container runtimes, management daemons, remote management channels (IPMI-like), or multi-tenant environments can expose enough capability to trigger the defective code. Environments that grant NET_ADMIN to less-trusted components (for example, container platforms, misconfigured management agents, or network control frameworks) are at elevated risk. Administrators should assume that if an attacker or buggy agent can manipulate LAGs or interfaces, they can cause a crash. (nvd.nist.gov)

Long tail: vendor kernels and embedded devices

Because the buggy code lives in an area tied to specific switch silicon front-ends, many embedded devices, network appliances, and vendor-provided kernels may be affected and not receive immediate patches. These vendors often maintain their own kernel trees or backport fixes unevenly; SUSE's advisory, for example, lists product states and indicates pending fixes or planned package updates. Systems that are difficult to upgrade (appliances, closed-box network devices) may remain exposed longer. (suse.com)

Practical detection and triage steps

If you manage Linux machines that might use the MSCC Ocelot driver or MSCC/Marvell switch front-ends, follow these steps to detect and triage:

Check your running kernel version:
Run: uname -r
Cross-check kernel version against vendor/security advisories and the NVD/SUSE CPE lists to determine if your kernel series falls inside the affected ranges. NVD and distribution trackers list version ranges and stable commit references. (nvd.nist.gov)
Confirm whether the driver is present:
Look for modules or source entries: lsmod | grep ocelot or inspect /sys/bus/platform/drivers and drivers/net/ethernet/mscc/.
On systems built into appliances or embedded boards, confirm with vendor documentation whether Ocelot-based switch front-ends are used. (spinics.net)
Search kernel logs for signs of ocelot-related oopses:
Use journalctl -k or dmesg | grep -i ocelot to find stack traces that mention ocelot_set_aggr_pgids, ocelot_write_rix, or other ocelot symbols. A NULL pointer dereference will often leave an oops trace that identifies the function. (nvd.nist.gov)
Audit who can manage network configuration:
Confirm which services, containers, or users have NET_ADMIN or equivalent capabilities. Reduce the attack surface by limiting NET_ADMIN to trusted administrative processes only.
In multi-tenant or orchestrated environments, check orchestration agents:
Ensure container runtimes, orchestration controllers, or management agents are not granted interface/bonding capabilities unless required.

Recommended mitigations and hardening

Apply vendor or distribution kernel updates that include the ocelot fix as soon as they are available. Upstream patches were committed and distribution trackers include fixes or updated kernel packages; updating to a kernel that contains the stable patch is the definitive remediation. Refer to your vendor’s security advisory for the exact package and version to install. (nvd.nist.gov)
If immediate kernel updates are impractical (embedded appliances, constrained systems), consider the following temporary mitigations:
Restrict NET_ADMIN capabilities to a minimal set of trusted administrators and management processes.
Disable or avoid performing LAG/bond configuration operations on affected hardware, or postpone them until the system is updated.
On systems that expose network management via web-based or remote management interfaces, harden access controls and firewall management-plane access.
For appliances with vendor support, contact the vendor to confirm whether their shipped kernel includes the Ocelot fix or whether an updated firmware/kernel will be provided. (suse.com)
Consider runtime mitigations:
Use kernel hardening tools and monitoring to detect repeated oopses and to alert on ocelot-specific kernel traces.
In orchestrated environments, enforce admission policies that prevent pods or containers from requesting NET_ADMIN unless explicitly necessary.

Why this class of bug keeps recurring (analysis)

Driver complexity and multiple front-ends for the same silicon family make these bugs more likely. The Ocelot library and its front-ends were implemented to support varying platform integration models (DSA framework vs platform-specific ethernet front-ends). Those different integration styles produce different invariants: the DSA frontend registers all ports and uses DSA_PORT_TYPE_UNUSED, whereas another backend left port entries as NULL for unused ports.
When shared library code assumes a consistent invariant (e.g., that the ports array contains valid pointers wherever a mask says a bit is set), and a frontend violates that invariant, boundary conditions are missed. The symptom is that a later code path dereferences a pointer that the earlier code path assumed would always be valid — precisely what happened in ocelot_set_aggr_pgids. The patch approach — add defensive pointer checks — is pragmatic and correct, but the root cause is architectural complexity and subtle differences in frontend behavior. (spinics.net)
This class of defect also underlines the importance of:

Rigorous static/dynamic analysis in driver code paths that manipulate arrays of pointers indexed by hardware port numbers.
Clear documentation and coding standards for driver front-ends to ensure consistent invariants.
Comprehensive code review and maintainer coordination: upstream reviewers on the kernel mailing lists pointed to missing Fixes: tags and encouraged proper maintainer CC lists to ensure the fix lands in the correct subtrees. (spinics.net)

What maintainers and vendors did (and should have done)

Upstream contributors submitted a concise patch to check pointer validity and skip unused ports. Reviewers asked for process compliance (Fixes: tag, correct CC lists) and for a clean commit message tying the fix to a prior commit to make stable backporting straightforward. Those are standard and important steps for kernel tree hygiene: they make it easier for stable maintainers to backport fixes across multiple stable branches and for distribution packagers to track the fix. (spinics.net)
Vendors should:

Track upstream stable commits and ensure proper backporting to the kernel series they ship.
Clearly publish advisories and package versions that include the fix.
For appliances and embedded devices with longer maintenance lifecycles, provide firmware or kernel updates and explicit mitigation guidance. SUSE and other trackers show the status for distro kernels; administrators of vendor appliances must request status or updates directly when distributors do not manage the appliance kernel. (suse.com)

Recommended action checklist for administrators

Immediately assess whether your systems are running kernels from the affected series (use uname -r) and whether the Ocelot driver is present on those systems. (nvd.nist.gov)
Prioritize patching servers and appliances that expose local management or NET_ADMIN rights to non-administrative agents. (suse.com)
If a vendor-supplied appliance is in use, contact the vendor to confirm whether they have applied the ocelot fix or will provide an update. (suse.com)
Harden access to local network configuration operations: remove unnecessary NET_ADMIN capabilities from containers and restrict management-plane access. (spinics.net)
Monitor logs for stack traces referencing ocelot symbols; if detected, treat as an actionable sign of a crash and schedule immediate remediation. (nvd.nist.gov)

Final assessment and outlook

CVE-2026-22982 is a clear example of a targeted kernel driver defect that produces a high-visibility operational impact (kernel crashes) despite a constrained attack vector. The fix is straightforward and landed upstream in the form of defensive pointer checks and small control-flow corrections; however, the real-world risk depends heavily on local privilege models and whether orchestration or management components can exercise the vulnerable code path.
For enterprise and cloud operators, the steps are familiar: verify exposure, apply vendor or distribution kernel updates that include the upstream fix, and harden the environment to reduce local interface-management capabilities for untrusted processes. For embedded device users and networking appliance operators, the attention should shift to vendor coordination: because such devices often run vendor kernels, confirming that the vendor will ship an update or provide a mitigation is essential.
The wider lesson here is that hardware abstraction front-ends and driver libraries need consistent invariants and careful review. When front-ends diverge in behavior, defensive checks in shared code are necessary — and maintainers must ensure those checks get backported promptly to all supported stable branches.
Administrators: treat this as a local, availability-impact vulnerability that is straightforward to remediate by patching, but potentially easy to trigger in environments that expose network management to less-trusted components. Patch promptly, restrict NET_ADMIN, and monitor for ocelot-related kernel oops messages until all affected systems have been updated. (nvd.nist.gov)

Quick reference (summary)

CVE: CVE-2026-22982 — kernel crash in net: mscc: ocelot when adding interface under a LAG. (nvd.nist.gov)
Root cause: NULL pointer dereference in ocelot_set_aggr_pgids when frontend leaves unused port pointers NULL. (spinics.net)
Impact: Denial-of-service / kernel panic (Availability). Attack vector: Local (requires ability to change network interfaces / LAG membership). (suse.com)
Fix: Upstream patch adds pointer checks and defensive logic; distribution vendors are issuing kernel updates or backports. Update kernels or apply vendor packages. (spinics.net)

Conclusion: CVE-2026-22982 is a narrowly scoped but operationally important kernel defect. The fix is in upstream kernels and in distributor trackers; administrators should prioritize patching and reduce the local attack surface for network-management operations until all affected systems are updated. (nvd.nist.gov)

Source: MSRC Security Update Guide - Microsoft Security Response Center

Linux Kernel Fix: MSCC Ocelot NULL Pointer Crash in LAG

Background / Overview​

Technical root cause: how the crash happens​

Two front-ends, different behaviors​

ocelot_set_aggr_pgids: iteration without a safety check​

Affected systems and scope​

How upstream fixed it​

Risk analysis: who should worry, and why​

Immediate impact: Denial-of-service (kernel crash)​

Exploitability: constrained, but real in many environments​

Long tail: vendor kernels and embedded devices​

Practical detection and triage steps​

Recommended mitigations and hardening​

Why this class of bug keeps recurring (analysis)​

What maintainers and vendors did (and should have done)​

Recommended action checklist for administrators​

Final assessment and outlook​

Quick reference (summary)​

Similar threads

Privacy & Transparency