mlperf inference

About this tag
The mlperf inference tag on WindowsForum.com covers discussions about standardized AI inference performance benchmarks, particularly Microsoft's Azure cloud infrastructure. Recent content highlights the Azure ND GB300 v6 virtual machines achieving 1.1 million tokens per second inference throughput using an MLPerf-style Llama 2 70B setup. This demonstrates how mlperf inference benchmarks validate real-world AI performance in enterprise cloud environments, with a focus on NVIDIA Blackwell Ultra GPUs and scalable VM configurations. The tag is relevant for IT professionals and developers evaluating AI inference capabilities on Windows-based cloud platforms.
  1. Azure ND GB300 v6 Delivers 1.1M Tokens/sec Inference

    Microsoft’s new ND GB300 v6 virtual machines have cracked a milestone that changes the practical limits of public‑cloud AI inference: one NVL72 rack of Blackwell Ultra GPUs sustained an aggregated throughput of roughly 1.1 million tokens per second, a result validated by an independent benchmark...