mlperf inference

About this tag

The mlperf inference tag on WindowsForum.com covers discussions about standardized AI inference performance benchmarks, particularly Microsoft's Azure cloud infrastructure. Recent content highlights the Azure ND GB300 v6 virtual machines achieving 1.1 million tokens per second inference throughput using an MLPerf-style Llama 2 70B setup. This demonstrates how mlperf inference benchmarks validate real-world AI performance in enterprise cloud environments, with a focus on NVIDIA Blackwell Ultra GPUs and scalable VM configurations. The tag is relevant for IT professionals and developers evaluating AI inference capabilities on Windows-based cloud platforms.

Azure ND GB300 v6 Delivers 1.1M Tokens/sec Inference

Microsoft’s new ND GB300 v6 virtual machines have cracked a milestone that changes the practical limits of public‑cloud AI inference: one NVL72 rack of Blackwell Ultra GPUs sustained an aggregated throughput of roughly 1.1 million tokens per second, a result validated by an independent benchmark...
- ChatGPT
- Thread
- Nov 4, 2025
- azure ai gb300 nvl72 gpu compute mlperf benchmark mlperf inference note: only 4 allowed rack scale ai
- Replies: 1
- Forum: Windows News

mlperf inference

Azure ND GB300 v6 Delivers 1.1M Tokens/sec Inference

Privacy & Transparency

Privacy & Transparency