Geekom A9 Max AI Review: A Compact Powerhouse for On‑Device AI

  • Thread Author
A blue-lit server tower glows with circuitry against a backdrop of network lines.
The Geekom A9 Max AI arrives in the small‑form‑factor ring as an unapologetically powerful mini PC — and a recent hands‑on test that ran the DeepSeek model locally exposed both the promise and the practical limits of packing modern on‑device AI into a palm‑sized chassis.

Background / Overview​

The Geekom A9 Max AI is built around AMD’s new Ryzen AI 9 HX 370 APU, pairing Zen 5 CPU cores with an RDNA 3.5 iGPU and a dedicated XDNA NPU. The vendor positions the platform as a compact desktop replacement capable of serious local AI inference — advertising up to 80 TOPS of combined AI acceleration and a dedicated NPU rating commonly reported around 50 TOPS in independent coverage. Configurations ship with DDR5 SO‑DIMMs (commonly 32 GB in retail SKUs and expandable up to 128 GB), dual M.2 PCIe 4.0 storage slots, Wi‑Fi 7, dual 2.5 GbE ports and modern I/O including USB4 and HDMI 2.1. Several mainstream reviews and product pages confirm this combination of high‑density compute and dense connectivity as the A9 Max’s defining characteristic.
At its common street price point near $999, the A9 Max represents an aggressive value proposition: far more compute per cubic centimeter than typical mini‑PCs, and a platform explicitly optimized for local AI tasks (from Copilot+ experiences to on‑device LLM inference). That positioning is part hardware and part marketing — and reality sits somewhere between the two. Independent hands‑on articles and vendor materials line up on the headline specs and price, but they also warn that the measurable AI performance you’ll see depends heavily on software stack, OS, model choice and thermals.

What the ZDNET test did — and why it mattered​

A recent ZDNET hands‑on (reviewer‑provided text) put the A9 Max through a particularly interesting practical test: the reviewer installed Ollama on the machine, downloaded a DeepSeek distilled model (deepseek‑r1:8b) and compared inference to their larger System76 Thelio desktop running Linux. The ZDNET test measured response time and output quality for a simple, comparable prompt and observed two striking results:
  • The A9 Max produced an answer more quickly on the same distilled 8B DeepSeek model (about 1:08 vs 2:42 on the Thelio), but the A9 Max answer was shorter, exhibited poor grammar and contained more factual errors.
  • When pushed to run a substantially larger model (gpt‑oss:120b), the A9 Max completed inference — but only after nearly five minutes, with loud fan activity and obvious thermal stress.
Those findings show a realistic trade‑off: the A9 Max can run local language models, and it does so with impressive wall‑clock throughput on smaller models, but the quality, depth and stability of the result depend on model size, the runtime environment and OS overhead. The reviewer also highlighted the real‑world friction of shipping the machine with Windows 11, arguing that much of the A9 Max’s headroom would be unlocked under Linux. The broad technical takeaway is: the hardware is capable, but software choices and model characteristics determine whether that capability translates into useful local AI. (Original review text provided by user.)

Technical verification: what independent sources confirm​

Two or more independent outlets and vendor materials corroborate the A9 Max’s core hardware story:
  • Geekom’s product listings and press materials advertise the Ryzen AI 9 HX 370, Radeon 890M graphics, dual NVMe slots, up to 80 TOPS aggregate AI performance claims and expansion to 128 GB DDR5 via SO‑DIMMs. Those vendor statements are reinforced by several product pages and national Euro/US retailers which list the same headline specs and show the same I/O suite.
  • Reputable reviewers — including Tom’s Hardware, Windows Central and Notebookcheck — tested the platform and confirmed the 12‑core (12C/24T) Ryzen AI HX370 silicon, the RDNA 3.5 890M iGPU performance profile and the presence of a dedicated XDNA NPU. Those write‑ups also independently note the advertised TOPS figures (with some nuance on whether the 80 TOPS number refers to a combined CPU/GPU/NPU ceiling while the NPU alone is commonly cited around 50 TOPS).
These cross‑checks validate the crucial hardware claims: the A9 Max is a genuine high‑end mini PC and is designed with local AI acceleration as a central feature.

DeepSeek, Ollama and the realities of local model inference​

What does it mean in practice to run DeepSeek or similarly large models locally on this kind of silicon? Community testing and reporting since DeepSeek’s public emergence shows several consistent behaviors worth understanding before you buy:
  • Distilled or smaller DeepSeek variants (7B–8B distilled versions) are commonly distributed in local model repositories (e.g., Ollama) and are far more tractable to run on CPU‑heavy or integrated‑GPU systems. Users repeatedly report that distilled DeepSeek variants produce chains of thought and longer outputs, which increases token generation and apparent latency per query. That partly explains why the A9 Max might appear faster on raw time to first completion but produce shorter or poorer quality answers — different runtimes and quantization settings dramatically change both token counts and token‑generation rates.
  • Larger models (tens of billions of parameters) are technically feasible on the A9 Max but will be slow and thermally demanding without a discrete, CUDA‑compatible GPU. Community benchmarks show that even capable integrated platforms can complete very large models, but at the cost of minutes‑long inference times and noisy cooling profiles. ZDNET’s experience (nearly five minutes for gpt‑oss:120b) sits within the expectations reported by other independent testers.
  • The operating system and runtime matter. Benchmarks and user reports indicate Ollama and other inference stacks often run measurably faster on Linux than on Windows in comparable configurations — a mix of lower system overhead, different driver stacks, and runtime scheduling explain why a Linux install sometimes unlocks 10–25% or more throughput. The ZDNET reviewer’s observation that Linux yields a far snappier local LLM experience is consistent with community experience.
In short: yes, you can run DeepSeek‑family models on the A9 Max. Expect meaningful variability in speed and quality depending on which distilled vs full model you use, how the model was quantized, whether you run on Windows or Linux, and how long you are willing to tolerate high fan speeds.

Strengths: where the A9 Max actually excels​

  • Density of compute — For a box the size of the A9 Max, the Ryzen AI 9 HX 370 + Radeon 890M + XDNA NPU deliver exceptional raw capability. This makes the machine one of the most powerful mini PCs available in the mainstream price band.
  • Local AI enablement — The combination of NPU and high‑frequency memory lets the A9 Max accelerate common Copilot+ and local LLM inference scenarios that historically demanded much larger rigs. For developers and privacy‑concerned users who want models running on‑prem, that’s significant.
  • Upgradeable RAM and dual NVMe — The use of SO‑DIMMs (not soldered) and two PCIe 4.0 NVMe slots gives real expandability. For local AI work, being able to scale RAM and storage without replacing the machine is a rare and valuable feature in a mini PC.
  • Modern I/O — USB4, HDMI 2.1, Wi‑Fi 7 and dual 2.5 GbE ports make the machine flexible for multi‑monitor, fast networking and external accelerators or storage. This helps it behave like a compact workstation rather than a mere HTPC.

Weaknesses and risks to weigh​

  • Windows 11 out of the box — The A9 Max ships with Windows 11 in many retail SKUs. While Windows is familiar, it consumes more RAM and system overhead than a lean Linux install, and community testing suggests Linux often yields better inference throughput for local LLM workloads. If you plan to use Ollama/LM Studio or multi‑GB model stacks heavily, factor in the time and skill required to reimage and validate Linux drivers.
  • Noise under load — Several independent reviews and the ZDNET reviewer observed loud fan activity under sustained AI or VM load. In a small chassis, aggressive cooling is necessary, and that produces audible noise levels that may be unacceptable in quiet offices or studio spaces. Plan for acoustic mitigation if silence matters.
  • Model fidelity vs speed — Faster wall‑clock results on smaller distilled models don’t necessarily equal better answers. The ZDNET case where a larger desktop produced a richer, more accurate answer despite being slower highlights that quality is decoupled from raw token/sec. Users focused on content accuracy, reasoning or multi‑step chain‑of‑thought outputs should prioritize model selection and runtime tuning over absolute speed. (Reviewer observation, supported by community model behavior.)
  • No discrete CUDA GPU — For workflows that depend on CUDA‑accelerated toolchains (many high‑performance LLM and diffusion pipelines), the integrated AMD iGPU and NPU are helpful but not full replacements for an RTX‑class GPU with extensive CUDA ecosystem support. If your workload is GPU‑first and CUDA‑dependent, a small desktop with an Nvidia dGPU may still be the better choice.
  • Marketing vs measured TOPS — Vendor TOPS claims are useful for marketing but tricky to translate into real‑world application throughput. Some outlets report 80 TOPS as an aggregate ceiling while others isolate the NPU at ~50 TOPS; how much of that TOPS budget applies to your chosen model depends on framework support, operator coverage and quantization. Treat vendor TOPS as an indicator, not a guarantee.

Practical recommendations: getting the most from an A9 Max​

If you’re considering the A9 Max for local AI work or as a compact workstation, follow these practical steps to maximize value and minimize surprises:
  1. Pick your OS deliberately.
    • If local LLM inference and maximum tokens/sec matter, plan for a Linux install (Ubuntu/Pop!_OS/Fedora), test drivers and verify Ollama or LM Studio behaviour before committing. Community evidence shows measurable gains on Linux in many cases.
  2. Use distilled or quantized models for frequent interactive work.
    • Distilled 7B–8B DeepSeek variants are far more responsive for chatty workflows; reserve 30–120B models for batch or asynchronous tasks where latency is acceptable. Community benchmarks and reviews confirm this trade‑off.
  3. Watch thermals and acoustics.
    • If the machine sits in a shared workspace, expect fan noise under load. Consider placing it in a ventilated shelf, using a noise‑damping stand, or testing an eco‑profile for quieter everyday use. Notebookcheck and Windows Central flag fan speed as notable under heavy loads.
  4. Allocate hardware for virtual machines and local servers carefully.
    • If running VMs or dev containers, reserve enough cores and RAM for the host and for inference tasks. The ZDNET reviewer’s VirtualBox experiment showed the A9 Max handled a guest Ubuntu instance well, but the host OS and VM allocation influence perceived performance. (Reviewer observation.)
  5. Validate software compatibility before you buy.
    • If you need particular drivers or libraries (ROCm for some AMD stacks, or specific CUDA‑only toolchains), confirm availability and community support — particularly if you plan to use AMD GPU features beyond basic iGPU display or NPU paths. Community discussions show mixed experiences depending on drivers and distributions.

Use cases where the A9 Max makes sense​

  • Privacy‑sensitive local LLMs — Developers or creators who need on‑device inference for confidentiality (legal, health, corporate IP) will appreciate the A9 Max’s ability to host distilled models without sending data to external APIs.
  • Compact creator workstation — For editing, web development and lighter 3D or video tasks (1080p workflows), the iGPU and fast storage produce an excellent small‑footprint workstation.
  • Edge AI prototypes and demos — Researchers and product teams building demos that must run locally can use the A9 Max to prototype Copilot+ style features and on‑device content generation without provisioning cloud instances.
  • Secondary or travel workstation — Its size and I/O make it a valid portable desktop replacement for road warriors who need real compute but lack space for a full tower.

Where to be cautious: who should look elsewhere​

  • If you need consistent, multi‑minute interactive throughput on 30B+ models in production, a machine with a discrete CUDA GPU and larger VRAM pool will deliver better ROI.
  • If acoustic silence is a hard requirement for your workspace, the A9 Max under sustained load will be audible.
  • If your stack relies on CUDA‑only libraries and cannot be migrated to AMD or CPU/NPU‑friendly frameworks, the A9 Max could impose integration friction.

Final assessment​

The Geekom A9 Max AI is a remarkable engineering exercise: it squeezes high‑end Zen 5 CPU cores, RDNA 3.5 graphics and a dedicated NPU into a palm‑sized chassis while exposing a new set of tradeoffs that are emblematic of the current mini‑PC and local‑AI era. The ZDNET reviewer’s DeepSeek test captures that tension perfectly — the machine is capable of surprising bursts of capability, but practical utility depends on software choices, the model you select, and how willing you are to lean into Linux and runtime tuning.
For buyers who want a powerful mini PC that supports on‑device AI experiments, multimedia workloads and an extremely small footprint, the A9 Max is one of the best value plays around the $999 mark. For those whose workflows are stuck to CUDA‑first production stacks, or who require whisper‑quiet operation under sustained load, a different form factor may still serve better.
The A9 Max is not a magic bullet that removes the engineering work of running LLMs locally; rather, it is the most compelling hardware platform so far that makes doing that work practical on a desktop‑sized budget. The next step for prospective owners is to decide whether they’ll accept the software work — OS switching, model selection and runtime tuning — required to turn that hardware promise into consistently useful local AI.

Source: ZDNET I ran DeepSeek on this mini PC from Amazon, and the results surprised me
 

Back
Top