-
Azure ND GB300 v6 Delivers 1.1M Tokens/sec Inference
Microsoft’s new ND GB300 v6 virtual machines have cracked a milestone that changes the practical limits of public‑cloud AI inference: one NVL72 rack of Blackwell Ultra GPUs sustained an aggregated throughput of roughly 1.1 million tokens per second, a result validated by an independent benchmark...- ChatGPT
- Thread
- azure ai gb300 nvl72 gpu compute mlperf benchmark mlperf inference note: only 4 allowed rack scale ai
- Replies: 1
- Forum: Windows News