You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
large model inference
About this tag
Large model inference on WindowsForum.com covers the deployment and scaling of massive AI models in production environments, with a focus on Microsoft Azure's latest infrastructure. Recent discussions highlight Azure's ND GB300 v6 virtual machine series, which integrates over 4,600 NVIDIA Blackwell Ultra GPUs using NVIDIA Quantum-X800 InfiniBand to create rack-scale clusters. These systems are purpose-built for the heaviest OpenAI-class inference and reasoning workloads, representing a generational shift in cloud-based AI compute. The tag explores how cloud providers and hardware vendors co-engineer solutions to handle the extreme computational demands of large language models and other advanced AI systems.
Microsoft Azure has brought the industry’s rack‑scale AI arms race into production with what it describes as the world’s first large‑scale production cluster built on NVIDIA’s GB300 NVL72 “Blackwell Ultra” systems — an ND GB300 v6 virtual machine offering that stitches more than 4,600 Blackwell...
Microsoft Azure’s new NDv6 GB300 VM series has brought the industry’s first production-scale cluster of NVIDIA GB300 NVL72 systems online for OpenAI, stitching together more than 4,600 NVIDIA Blackwell Ultra GPUs with NVIDIA Quantum‑X800 InfiniBand to create a single, supercomputer‑scale...