Microsoft’s public pledge to “have mainly Microsoft silicon in the data center” is not rhetoric — it’s a strategic pivot with clear technical, economic, and competitive consequences for Azure customers, hardware partners, and the AI industry at large. Kevin Scott, Microsoft’s CTO, set the tone at a CNBC fireside chat, saying Microsoft will prioritize its own accelerators where it makes sense and will design entire systems — network, cooling, and chip — to optimize for generative AI workloads.
Microsoft’s decision to shift a greater share of AI workloads from third‑party GPUs to in‑house accelerators is the culmination of multiple forces: immense and growing demand for inference and training capacity; hyperscaler economics that prize performance per dollar above all; supply‑chain realities around GPUs; and the opportunity to build vertically integrated stacks that trade off peak raw performance for overall system efficiency. Microsoft is no longer experimenting at the edges — it already ships an Arm‑based Cobalt CPU series and the first‑generation Maia 100 AI accelerator in Azure, and has signaled roadmaps for follow‑on Maia silicon.
At the heart of Microsoft’s argument is simple economics: once a cloud operator controls its own silicon, it can tune hardware and software together, reduce third‑party margins, and — crucially — design data‑center racks and cooling to match the chip’s thermal and networking profile. Kevin Scott emphasised that this systems‑level ownership gives Microsoft the “freedom to make the decisions” required to optimize compute for the workload.
Technically, Maia 100’s published datapoints position it as a pragmatic, inference‑focused accelerator rather than a raw performance king:
But peak FLOPS are not the only metric that matters for hyperscalers. For inference on many practical LLMs, end‑to‑end latency, utilization, power draw, and cost per useful token are often more important. Maia’s system integration — the custom rack power distribution, MX data format, and Triton integration — can improve effective utilization and reduce total cost of ownership for specific models and deployment patterns. Microsoft’s blog materials explicitly frame Maia as a systems play rather than an attempt to beat GPUs on raw metrics alone.
This fragility manifests in several practical ways:
Google, long an early innovator in custom silicon via TPUs, continues to evolve TPU architectures aimed at a mixture of training and inference. Google’s TPU pods and the latest Ironwood/TPU family emphasize shared memory, optical interconnects, and extreme memory capacity, and Google has demonstrated multi‑pod scaling and new system topologies that emphasize model‑scale efficiency. In other words, Google and Amazon have already proven the utility of hyperscaler SOCs for both internal workloads and select external customers.
This is why the likely market outcome is not wholesale GPU replacement, but a multi‑modal datacenter:
Practical headwinds include:
For hardware partners like NVIDIA and AMD, Microsoft’s move ratchets up strategic pressure but is not an existential threat. Microsoft remains a major buyer: shifting some internal workloads to Maia can free GPU capacity for paying customers, a dynamic that benefits NVIDIA/AMD in the near term. In the longer term, sustained in‑house silicon growth could reduce total GPU volumes purchased by Microsoft, pressuring vendor revenues — though the same vendors will continue to sell to other cloud providers and enterprise customers.
For the industry as a whole, a few systemic effects are likely:
If Microsoft can reliably iterate Maia‑class designs, maintain staff continuity, and deliver a developer experience that minimizes migration friction, the company can meaningfully reduce dependence on external GPUs over time. But the next 12–24 months — including the timing and performance of Maia follow‑on chips and how effectively Microsoft integrates them into the Azure stack — will determine whether the balance of power shifts more permanently. For the moment, the most likely outcome is a multi‑backend ecosystem where GPUs and hyperscaler ASICs coexist, each optimized for different slices of the AI workload pie.
Microsoft’s public statements are clear and the rationale compelling: control the stack, optimize systems, and chase price‑performance. The practical path forward, however, requires execution across silicon design, software tooling, personnel retention, and ecosystem compatibility. The industry will be watching to see whether Maia’s successors close the gap on raw performance while preserving the systems advantages that made Microsoft build its own chips in the first place.
Source: theregister.com Microsoft aims to swap AMD, Nvidia GPUs for its own AI chips
Background
Microsoft’s decision to shift a greater share of AI workloads from third‑party GPUs to in‑house accelerators is the culmination of multiple forces: immense and growing demand for inference and training capacity; hyperscaler economics that prize performance per dollar above all; supply‑chain realities around GPUs; and the opportunity to build vertically integrated stacks that trade off peak raw performance for overall system efficiency. Microsoft is no longer experimenting at the edges — it already ships an Arm‑based Cobalt CPU series and the first‑generation Maia 100 AI accelerator in Azure, and has signaled roadmaps for follow‑on Maia silicon. At the heart of Microsoft’s argument is simple economics: once a cloud operator controls its own silicon, it can tune hardware and software together, reduce third‑party margins, and — crucially — design data‑center racks and cooling to match the chip’s thermal and networking profile. Kevin Scott emphasised that this systems‑level ownership gives Microsoft the “freedom to make the decisions” required to optimize compute for the workload.
What Microsoft has built so far
Maia 100: a practical first step
Microsoft’s first‑generation accelerator, Maia 100, debuted as a vertically integrated system: silicon paired with custom server boards, racks, interconnect, and a software stack designed to make model hosting efficient at hyperscale. The Maia 100 chip is a large, reticle‑size SoC built on TSMC’s N5 process with COWOS‑S packaging, and Microsoft engineers designed the entire rack‑level system — including closed‑loop liquid cooling and a bespoke Ethernet‑based fabric — to squeeze density and efficiency from the silicon.Technically, Maia 100’s published datapoints position it as a pragmatic, inference‑focused accelerator rather than a raw performance king:
- Chip area: ~820 mm²; packaging: TSMC N5 with COWOS‑S.
- Memory: 64 GB HBM2e with aggregate bandwidth around 1.8 TB/s.
- Peak dense BF16 tensor performance: ~0.8 petaFLOPS (800 teraflops).
- Provisioned TDP: designed up to 700 W, typically provisioned at 500 W in racks.
Cobalt and security silicon
Parallel to Maia, Microsoft has invested in Cobalt, its Arm‑based custom CPU line for general‑purpose cloud tasks, and in platform security ASICs designed to accelerate cryptography and establish silicon roots of trust across Azure. Cobalt positions Microsoft to optimize VM hosts and control plane services at the chip level, while security silicon can materially reduce latency and attack surface for critical cryptographic operations. Microsoft’s public material and partner statements (including from Arm) confirm Cobalt’s role in Azure and the company’s broader silicon strategy.Why Microsoft wants to move more AI workloads onto Maia and other in‑house silicon
- Price‑performance at scale: Hyperscalers buy by the megawatt and the rack. Even modest improvements in $/peak‑useful‑FLOP or $/inference‑token compound into tens or hundreds of millions of dollars saved at scale. Microsoft calls this metric its north star.
- Supply assurance and diversification: Relying only on external GPU vendors creates procurement risk when demand spikes. Owning a proportion of capacity hedges supply constraints and reduces exposure to single‑vendor backlogs.
- System‑level optimization: When a company controls chip, thermal design, and interconnect, it can optimize throughput and density in ways closed ecosystems cannot — for example, the Maia rack‑level liquid cooling and bespoke fabric.
- Specialization for targeted workloads: Not all AI tasks need Nvidia‑class FP16/FP8 throughput. Many inference scenarios and medium‑sized models can run cost‑effectively on domain‑specific accelerators that provide higher utilization and lower operating cost.
- Strategic leverage: Owning silicon is a powerful bargaining chip in vendor relationships and creates options for product differentiation across cloud services and Copilot/Copilot Studio integrations.
How Maia stacks up against GPUs (a technical reality check)
Compare Maia 100’s numbers — 0.8 PFLOPS BF16, 64 GB HBM2e, 1.8 TB/s memory — with contemporary NVIDIA H100/H200 or AMD MI300 class GPUs and the differences are blunt: GPUs retain a commanding edge in raw tensor throughput and memory bandwidth. NVIDIA’s H100 and H200 family deliver multiple petaFLOPS at BF16/FP16 and higher memory bandwidth figures (H100 PCIe/SXM variants range in the multiple‑TB/s region depending on configuration), which is why GPUs still dominate training workloads where peak throughput and breadth of software support matter.But peak FLOPS are not the only metric that matters for hyperscalers. For inference on many practical LLMs, end‑to‑end latency, utilization, power draw, and cost per useful token are often more important. Maia’s system integration — the custom rack power distribution, MX data format, and Triton integration — can improve effective utilization and reduce total cost of ownership for specific models and deployment patterns. Microsoft’s blog materials explicitly frame Maia as a systems play rather than an attempt to beat GPUs on raw metrics alone.
The roadmap and its fragility: delays, design changes, and talent churn
Microsoft’s ambitions run up against the realities of chip development. Reports in mid‑2025 indicated mass production of the follow‑on Maia design (internally codenamed Braga / Maia 200) slipped into 2026, attributed to design changes, staff turnover, and added feature requests from partners including OpenAI. Those delays highlight a critical risk: building competitive silicon at hyperscaler scale is hard, slow, and personnel‑intensive. Microsoft’s first‑gen Maia achieved limited production and demonstrated the systems approach, but subsequent generations must close the compute and bandwidth gap with NVIDIA/AMD to achieve the aim of “mainly Microsoft silicon” in the datacenter. These are not speculative problems — they are the hard engineering realities reported by industry outlets and insiders.This fragility manifests in several practical ways:
- Long development cycles lengthen time to market and allow competitors to widen performance leads.
- Staff churn, particularly among experienced chip designers, disrupts schedules and introduces risk to complex timing‑ and validation‑sensitive subsystems.
- Requests to add functionality from major customers (OpenAI and others) can destabilize design schedules if requirements shift mid‑cycle.
Competitive landscape: Amazon, Google, and the economics of proprietary accelerators
Microsoft is late relative to AWS and Google in building custom AI silicon at scale. Amazon launched Trainium and Inferentia families years earlier and has aggressively expanded Trainium2/Trainium3 plans; Anthropic and other large model providers have aligned training and inference workloads to AWS hardware where economics suit them. Amazon has also publicly invested billions in Anthropic and worked with customers to build enormous Trainium clusters for both training and fine‑tuning.Google, long an early innovator in custom silicon via TPUs, continues to evolve TPU architectures aimed at a mixture of training and inference. Google’s TPU pods and the latest Ironwood/TPU family emphasize shared memory, optical interconnects, and extreme memory capacity, and Google has demonstrated multi‑pod scaling and new system topologies that emphasize model‑scale efficiency. In other words, Google and Amazon have already proven the utility of hyperscaler SOCs for both internal workloads and select external customers.
This is why the likely market outcome is not wholesale GPU replacement, but a multi‑modal datacenter:
- Hyperscalers will continue to run massive GPU pools because many customers require them and because GPUs still lead in raw throughput and ecosystem support (CUDA, cuDNN, libraries).
- Proprietary ASICs (Maia, Trainium, TPU) will grow as a share of internal workloads and for customers whose models can be ported to those stacks cost‑efficiently.
- Interoperability layers (Triton, ONNX, MX formats) will be decisive: the easier it is to move models between hardware backends, the faster proprietary silicon adoption will grow.
Software and ecosystem friction: the real adoption bottleneck
Hardware without a software ecosystem is a paperweight. NVIDIA’s grip on the AI market is bolstered not just by FLOPS but by decades of software investment: CUDA, cuDNN, cuBLAS, optimized kernels, and a rich third‑party ecosystem. Microsoft’s Maia effort includes an SDK, Triton compatibility, and the MX data format to ease model portability — but porting, validating, and optimizing production LLMs across a new ISA and runtime is nontrivial.Practical headwinds include:
- Model portability: Moving large models to new accelerators often requires re‑engineering kernels, testing numerics (mixed precision, quantization), and re‑optimizing sharding strategies.
- Performance tuning: Achieving cost‑effective inference requires performance engineering — something customers and third‑party model providers must invest in.
- Toolchain robustness: Debuggers, profilers, and observability tooling need to be as mature as the GPU stack to avoid developer friction.
- Customer inertia: Enterprises and ISVs already invested in GPU‑based pipelines will be reluctant to migrate unless savings or performance improvements are clear and low‑risk.
Business implications: what this means for Azure customers, hardware partners, and the market
For Azure customers the short‑to‑medium‑term reality is straightforward: expect Microsoft to continue offering Nvidia and AMD GPUs alongside its own Maia accelerators. Enterprises that need the absolute top‑end throughput or rely on third‑party GPU‑tuned stacks will continue to choose GPU instances. But for customers with latency‑sensitive inference, tightly controlled deployment patterns, or those willing to adapt to new runtimes, Maia‑class offerings could become materially cheaper per useful result.For hardware partners like NVIDIA and AMD, Microsoft’s move ratchets up strategic pressure but is not an existential threat. Microsoft remains a major buyer: shifting some internal workloads to Maia can free GPU capacity for paying customers, a dynamic that benefits NVIDIA/AMD in the near term. In the longer term, sustained in‑house silicon growth could reduce total GPU volumes purchased by Microsoft, pressuring vendor revenues — though the same vendors will continue to sell to other cloud providers and enterprise customers.
For the industry as a whole, a few systemic effects are likely:
- Greater hardware diversity in data centers, with TPU/Trainium/Maia coexisting with GPUs.
- Faster innovation cycles for data‑center system design as hyperscalers compete on integrated racks, cooling, and interconnect.
- Tighter coupling between cloud platform capabilities and proprietary models and services (e.g., Copilot, Azure AI Foundry), which creates lock‑in for some customers but also differentiated performance options.
Risks and warning flags
Microsoft’s pivot is strategically rational, but it is not without real risks. Here are the headline warnings:- Execution risk: The reported delay of the next‑gen Maia series into 2026 demonstrates how hard it is to execute on advanced accelerators. Delays slow down cost improvement trajectories and give competitors time to widen performance lead.
- Economic tradeoffs: If Maia variants fail to reach competitive memory bandwidth or interconnect efficiency compared with the latest GPUs, the promised $/useful‑work benefit may not materialize for many LLM workloads.
- Ecosystem friction: Developers and ISVs face migration costs. If Microsoft cannot prove substantial TCO gains, customers will default to the familiar GPU path.
- Talent and retention: High turnover in chip teams erodes IP continuity and schedule predictability — a problem explicitly reported in the Maia follow‑on development cycle.
- Vendor relations: Microsoft’s partial self‑sufficiency could strain commercial dynamics with Nvidia and AMD, potentially leading to less favorable pricing or slower supply in adversarial scenarios — though the counterargument is that Microsoft’s purchases still matter enormously to GPU vendors.
Practical takeaways for WindowsForum readers and IT decision‑makers
- If you need absolute peak performance for model training today, GPUs remain the right choice. Top‑end GPUs still lead in raw tensor FLOPS and memory bandwidth, and they enjoy the richest tooling and third‑party support.
- If you run large volumes of inference, especially for mid‑sized LLMs or latency‑sensitive services, Azure Maia options may become an attractive, lower‑cost alternative. Evaluate Maia offerings on cost‑per‑inference and token latency, not just raw FLOPS.
- Plan for a hybrid future. Expect cloud architectures to include GPUs, proprietary ASICs, and specialized CPUs. Design portability into model deployment pipelines now: containerize dependencies, standardize on Triton/ONNX where feasible, and measure TCO across candidate backends.
- Watch the software stack maturity. The speed at which Microsoft eases model porting (via SDKs, Triton compatibility, MX formats) will determine how quickly customers can shift workloads.
- Factor procurement and SLAs into decisions. In a market with dynamic hardware supply, contractual guarantees on capacity and performance will be as important as per‑unit pricing.
Outlook: a pragmatic hybrid, not an overnight revolution
Microsoft’s intent to move a majority of its internal AI workloads onto its own silicon is credible and consequential. It reflects a reasonable hypothesis: that vertical integration — chip, rack, cooling, software — can yield meaningful economic advantages for a hyperscaler that does enormous volumes of AI work. Yet the transition is neither simple nor inevitable. GPUs will remain essential for training and for customers who require the GPU software ecosystem. Proprietary accelerators will grow in importance for targeted workloads, but adoption will be incremental, tied to the pace of hardware improvement, software portability, and demonstrated TCO advantages.If Microsoft can reliably iterate Maia‑class designs, maintain staff continuity, and deliver a developer experience that minimizes migration friction, the company can meaningfully reduce dependence on external GPUs over time. But the next 12–24 months — including the timing and performance of Maia follow‑on chips and how effectively Microsoft integrates them into the Azure stack — will determine whether the balance of power shifts more permanently. For the moment, the most likely outcome is a multi‑backend ecosystem where GPUs and hyperscaler ASICs coexist, each optimized for different slices of the AI workload pie.
Microsoft’s public statements are clear and the rationale compelling: control the stack, optimize systems, and chase price‑performance. The practical path forward, however, requires execution across silicon design, software tooling, personnel retention, and ecosystem compatibility. The industry will be watching to see whether Maia’s successors close the gap on raw performance while preserving the systems advantages that made Microsoft build its own chips in the first place.
Source: theregister.com Microsoft aims to swap AMD, Nvidia GPUs for its own AI chips