• Thread Author
Mozilla’s latest stable release shipped an ambitious privacy-first AI feature — on-device tab grouping — but early adopters say the convenience comes with an unwelcome cost: runaway CPU use, fan noise, and faster battery drain on laptops. What began as small community reports has grown into a broader conversation about on-device inference, model formats, and how far browsers should push local AI before it becomes a performance burden. Mozilla’s official notes describe automatic tab grouping as a progressive rollout, but the implementation details — an isolated “inference” process, ONNX-based models stored locally, and a shared runtime built on Transformers.js — explain both why the feature works and why it can be heavy on certain systems. (mozilla.org, firefox-source-docs.mozilla.org)

A neon-lit laptop displays a colorful dashboard with an RGB cooling fan mounted on the side.Background / Overview​

Mozilla introduced tab groups to Firefox over several releases and, with Firefox 141, added an AI-powered option that can suggest groupings and generate names automatically. The promise is simple: when you run dozens of tabs for research or projects, a local model will cluster related tabs and reduce manual organization. The privacy angle is central — Mozilla emphasizes that the processing happens on-device and does not send your browsing data to external servers. The feature is being rolled out gradually to users so Mozilla can gather feedback and refine behavior. (mozilla.org, omgubuntu.co.uk)
Under the hood, Firefox ships a dedicated ML runtime component that uses Transformers.js and the ONNX runtime to run compatible models inside an isolated inference process. Models are cached and managed locally (IndexedDB is used as the storage mechanism), and Firefox provides tooling so that models can be prepared from Hugging Face artifacts and made compatible with the runtime. That architecture is the reason you may see an extra Firefox process labeled “Inference” in about:processes or in the Windows Task Manager — it’s how Mozilla isolates model inference from content and UI code. (firefox-source-docs.mozilla.org)

What users are reporting — the symptoms and the scope​

The complaints​

  • Laptops running warm and fans spinning at full speed shortly after opening Firefox or loading multiple tabs.
  • Noticeable battery life reduction on portable devices while Firefox is idle or doing light browsing.
  • One or more Firefox processes showing unexpectedly high CPU use; community reporters found an entry called “Inference” in about:processes that sometimes spiked.
  • Some users reported instability when trying to kill the inference process directly. (tomshardware.com, opensourceforu.com)

How widespread is the problem?​

The reports are currently clustered in community channels — Reddit threads and tech sites that pulled together anecdotal evidence from early adopters. Mozilla’s release notes and subsequent patch notes do not list a targeted fix for inference-run CPU spikes in the initial dot releases, and independent testing from several outlets found the feature is rolled out progressively (so not every user sees it at the same time). Because the rollout is gradual, some observers who test the same Firefox version won’t reproduce the issue simply because the automatic grouping feature wasn’t enabled for their profile. That makes the reports noisy and hard to quantify centrally. (mozilla.org, tomshardware.com)
Important caveat: specific CPU percentages quoted in forum posts and social media are anecdotal and environment-dependent. They’re useful for flagging an issue but not proof of a universal defect. Where precise numbers are cited (for example, an “Inference” process hitting very high percentages), treat those as user reports, not official telemetry. (tomshardware.com)

Why Firefox’s design choices make this both powerful and potentially expensive​

On-device inference: privacy by design, CPU by necessity​

Mozilla intentionally runs models locally for privacy — titles, descriptions, and metadata from open tabs are processed on-device and not uploaded to Mozilla. That’s a clear privacy win compared with browser features that perform queries in the cloud. But local inference means the user’s CPU (or GPU) must do the work. On modern desktop hardware with many cores and robust cooling, occasional inference tasks are barely noticeable. On a thin-and-light laptop with an efficient but thermally-limited CPU, sustained inference workloads can raise core temperatures, trigger fan curves, and shorten battery runtime. The trade-off is explicit: privacy + responsiveness for local inference vs. resource cost on weaker hardware. (mozilla.org, firefox-source-docs.mozilla.org)

Transformers.js + ONNX runtime: portability — and a weighty runtime​

Mozilla’s ML stack in Firefox is built around Transformers.js and the ONNX runtime. ONNX is a widely-adopted interoperable format for neural network models and ONNX Runtime is a mature engine with acceleration options on many platforms. Firefox uses ONNX because it provides broad model compatibility and ecosystem tooling (including web-targeted runtimes), but ONNX models can still be large and compute-intensive — especially if they’re not aggressively quantized or optimized for the target hardware. Mozilla’s documentation and engineering notes show that models compatible with the runtime are expected to have ONNX weights at various quantization levels; the model conversion path is based on Transformers.js conversion scripts. (firefox-source-docs.mozilla.org, onnxruntime.ai)
Because the runtime loads Transformers.js and the ONNX WASM engine into a separate inference process, there’s overhead in starting and executing models. The design isolates inference (a security and stability benefit) but also means a visible process exists that can occupy significant CPU cycles while it runs a clustering or naming pass over many tabs. (firefox-source-docs.mozilla.org)

Technical verification: what Mozilla actually ships and how it works​

  • Firefox’s official release notes for version 141 document the feature as a local AI model that “identifies similar tabs, automatically organizes them into groups, and even suggests group names,” and the notes confirm the rollout is progressive. (mozilla.org)
  • The Firefox Source Docs describe the Firefox AI Runtime as an experimental runtime built on Transformers.js and the ONNX runtime, and detail a workflow for converting models (from Hugging Face) into ONNX weights in multiple quantized forms for use in Firefox. The docs explicitly mention a toggleable preference (browser.ml.enable) to try the runtime in Nightly and show how to exercise the engine. That documentation explains model storage (IndexedDB) and the separate inference process model. (firefox-source-docs.mozilla.org)
  • Mozilla’s product blog describes the WebExtensions-facing inference APIs and reaffirms the choice to keep inference local, explaining that models are downloaded once (shared across origins) and cached to avoid redundant downloads. The blog post notes that the runtime is accessible to WebExtensions and is intended to let extensions run offline ML tasks without server-side calls. That architecture both enables powerful local features and increases the attack surface for resource usage. (blog.mozilla.org)
Taken together, these sources validate that: (1) Firefox runs local models in a distinct inference process, (2) those models are ONNX artifacts converted with Transformers.js tooling, and (3) model files are cached locally (IndexedDB/Addon UI) and shared across contexts. Those facts explain users’ observations and justify the privacy claims — while also explaining how a misbehaving model or runtime could spike CPU. (mozilla.org, firefox-source-docs.mozilla.org, blog.mozilla.org)

Practical advice: how to diagnose and mitigate high CPU / battery drain​

Below are concrete steps to check whether Firefox AI is the culprit and how to switch it off if it’s hurting your battery or performance.

Quick checks (diagnose)​

  • Open about:processes in Firefox to inspect per-process CPU and memory usage; look for a process labeled “Inference” or anything high and disproportionate compared with web content. about:processes shows Firefox’s internal process types and can help isolate inference work. (tomshardware.com)
  • Use Windows Task Manager (or Activity Monitor on macOS, top/htop on Linux) to correlate Firefox sub-processes with system-wide CPU and power usage. Windows Task Manager’s “Power usage” and “Power usage trend” columns are useful for spotting energy impact.
  • If the inference process spikes when you open lots of tabs, try closing groups of tabs and watch for the CPU to drop; this helps confirm the workload is the local ML pass. (opensourceforu.com)

Disable smart tab grouping (two methods)​

A. Via Settings (when the option is visible)
  • Open Settings (about:preferences) → Tabs section → uncheck “Use AI to suggest tabs and a name for tab groups” if available. This toggles the UI opt-in for smart tab suggestions for users in the progressive rollout. (Not all users have this toggle while the feature is staged.) (askvg.com)
B. Via about:config (advanced, works regardless of UI rollout) — verified prefs:
  • Type about:config and accept the risk. (askvg.com)
  • Search and toggle these preferences to false:
  • browser.tabs.groups.smart.enabled — turns off the smart tab suggestions. (askvg.com)
  • browser.ml.enable — disables the Firefox ML runtime more broadly (when present). (firefox-source-docs.mozilla.org)
  • browser.ml.chat.enabled — if you see this and want to disable chat features tied to the ML runtime. (tygocover.com)
  • Restart Firefox to ensure the change takes effect.
These keys are documented in community help pages and Firefox docs and serve as an immediate way to prevent the inference process from running in normal browsing workflows. Use caution: flipping hidden prefs can change behavior in ways some users may not expect, but the toggles above are specifically provided for the ML/runtime and smart tab features. (askvg.com, firefox-source-docs.mozilla.org)

Remove downloaded models (if present)​

Firefox provides a UI in about:addons (the Add-ons Manager) to list and remove locally cached AI models as of the Firefox 140+ work. Deleting models will remove the binary payloads that the runtime can load; the feature itself remains but without the model the automatic suggestions cannot run until the model is re-downloaded. Check Add-ons → On-Device AI or similar entries for model management and remove large models if disk or memory is a concern. (blog.nightly.mozilla.org, omgubuntu.co.uk)

Other general mitigations​

  • Run Firefox in Troubleshoot Mode (disables extensions) to verify the issue is intrinsic to Firefox’s runtime and not a third-party extension. (minitool.com)
  • Reduce “Content process limit” (Settings → Performance → uncheck “Use recommended performance settings” → set content process limit lower) to lower the number of renderer processes Firefox uses; this can reduce overall memory pressure. (wintips.org)
  • Update Firefox to the latest patch release; Mozilla releases security and stability fixes frequently and may roll fixes for ML-related regressions into minor updates. Keep an eye on the official release notes. (mozilla.org)

Balanced analysis: strengths, trade-offs, and risks​

Strengths and why Mozilla’s approach is defensible​

  • Privacy-first on-device inference: by keeping tab metadata and inference local, Mozilla reduces the privacy risk of cloud-based analysis. That’s a differentiator compared with some products that rely on server-side AI. For privacy-conscious users this is fundamentally attractive. (mozilla.org)
  • Interoperable and extensible stack: using ONNX and Transformers.js means Firefox can leverage a broad set of existing models and conversion tools, which accelerates feature development and third-party extension capabilities. The ONNX runtime also offers acceleration paths for hardware that support it. (firefox-source-docs.mozilla.org, onnxruntime.ai)
  • Model transparency and user control: Mozilla’s work to expose models in about:addons and allow users to remove models is a strong UX and privacy move — it surfaces the otherwise-hidden model binaries and metadata, and it gives users agency over what resides on their machine. This is a far better UX than buried, opaque caches. (blog.nightly.mozilla.org)

Risks and downsides​

  • Hardware variability and battery impact: the single biggest practical risk is that any on-device model will behave differently across a wide range of user hardware. High-end desktops will barely notice; thin laptops with low TDP CPUs will show dramatic battery and thermal effects. Progressive rollouts reduce the blast radius but don’t eliminate per-device pain. (tomshardware.com)
  • Model format and optimization constraints: the reliance on ONNX is a pragmatic engineering choice but not a panacea. Community voices suggesting lighter-weight formats (for example, GGUF/other binary formats optimized for small inference runtimes) have merit in some local-LLM contexts; however, switching formats has engineering costs and compatibility trade-offs. Claims that ONNX is the single cause of poor performance are simplistic — model size, quantization level, runtime paths, and scheduling policies all matter. Treat format complaints as a useful signal, not an immediate proof. (firefox-source-docs.mozilla.org, onnxruntime.ai)
  • User experience fragmentation: progressive rollouts plus about:config toggles plus extension-based access create a situation where different users on the same release may have dramatically different experiences. For enterprise or support teams, that complicates troubleshooting and makes a single official stance or instruction set trickier to promulgate. (mozilla.org)
  • Perceived bloat and trust: some long-term users see core browsers adding on-device AI as scope creep, and the perception of bloat is amplified when that AI consumes resources unexpectedly. Even if the model is local and privacy-preserving, poor performance can erode trust faster than good security messaging can rebuild it. (opensourceforu.com)

What Mozilla — and browser makers generally — should and could do​

  • Ship clearer opt-outs: a single global “disable AI features” toggle in Settings, with a clear explanation of what it disables (runtime, models, suggestions), would make this feature less surprising for users. The present mix of Settings, about:config flags, and Add-ons Manager model removal creates unnecessary friction. (askvg.com, blog.nightly.mozilla.org)
  • Smarter throttling for laptops: implement a device-aware scheduler that throttles inference runs when on battery, under a thermal threshold, or on low-powered CPUs (think energy-aware scheduling). This would preserve the feature for desktops while protecting laptops. There’s precedent for platform-aware behavior in other system services — the ML runtime should be similarly conservative. (firefox-source-docs.mozilla.org)
  • Default to lightweight models on low-power devices: detect device class and download a low-quantization or smaller model variant by default; keep larger models optional and user-initiated. Mozilla’s model conversion pipeline already supports multiple quantization levels — leverage that to make the default experience kinder to low-end hardware. (firefox-source-docs.mozilla.org)
  • Transparent telemetry and opt-in diagnostics: with explicit user consent, collect anonymized telemetry on inference CPU and memory patterns to help prioritize fixes. Early community reports are helpful, but actionable telemetry across a representative fleet is what allows engineering teams to prioritize real regressions vs. outliers. (Telemetry should be opt-in and clearly documented.) (mozilla.org)

Quick checklist: What you should do right now if Firefox 141 is burning your battery​

  • Check for an update and install the latest Firefox build. (mozilla.org)
  • Inspect about:processes to confirm an “Inference” process is active. (tomshardware.com)
  • Disable smart tab grouping via Settings if available, or use about:config to set browser.tabs.groups.smart.enabled = false. Restart Firefox. (askvg.com)
  • If you still see model files or want to free disk/memory, open about:addons and remove any “On-Device AI” models shown in the Add-ons Manager. (blog.nightly.mozilla.org)
  • If the problem persists, run Firefox in Troubleshoot Mode to rule out extensions, and consider lowering content process limits. (minitool.com, wintips.org)

Final verdict — practical, not rhetorical​

Mozilla’s addition of smart, local tab grouping in Firefox 141 is a principled move: it attempts to combine helpful automation with a privacy-first architecture. The engineering choices — a separate inference process, Transformers.js + ONNX runtime, IndexedDB model caching, and add-on model management — are coherent and technically defensible. Those choices also explain the core UX tension: local AI gives privacy and power but consumes local compute and battery.
At this stage the discussion is less about whether on-device AI is a good idea and more about execution: defaults, device-aware scheduling, transparent controls, and nimble optimization matter immensely. Users experiencing hot devices and poor battery life have concrete workarounds — disabling the smart grouping feature via Settings or about:config and removing local models via about:addons — but the broader product lesson is that performance-sensitive features need conservative defaults, especially when rolled out progressively across varied hardware.
Monitor Mozilla’s release notes and incremental fixes, use the about:config toggles if you need an immediate stopgap, and expect the browser to continue iterating on on-device AI — but with a clearer set of user controls and device-aware throttles, this feature can evolve from a nuisance for some users into a genuinely useful tool for serious tab-hoarders and researchers alike. (mozilla.org, firefox-source-docs.mozilla.org, tomshardware.com)

Source: Windows Report Firefox AI slammed for hogging CPU and draining battery
 

Back
Top