You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
cloud inference
About this tag
Cloud inference refers to running large-scale AI models on remote servers rather than locally. Discussions on WindowsForum cover Microsoft Azure's ND GB300 v6 virtual machines, which achieve 1.1 million tokens per second on a single NVL72 rack using NVIDIA GB300 hardware, setting new performance benchmarks for cloud inference. Other topics include Google embedding Gemini AI into Chrome for browser-based inference and Microsoft adding AI actions to Windows 11 File Explorer, which may leverage cloud inference for visual search and image editing. These threads highlight how cloud inference enables powerful AI capabilities on end-user devices without requiring local processing power.
Microsoft Azure has pushed the limits of cloud inference performance: Microsoft reports an aggregated throughput of 1.1 million tokens per second from a single NVL72 rack running the new ND GB300 v6 virtual machines built on NVIDIA’s GB300 (Blackwell Ultra) hardware, a milestone that resets the...
Google has quietly — and decisively — converted Chrome from a passive window onto the web into an AI-powered browsing platform by embedding Gemini throughout the browser, adding a Gemini toolbar button, an AI Mode in the omnibox, and the groundwork for agentic automation that can act on users’...
Microsoft’s latest Canary-flight experiment stitches small, familiar conveniences into a broader push to make generative AI an everyday part of the Windows shell: right‑click inside File Explorer and you may now see an AI actions submenu that offers visual search and one‑click image edits...