Microsoft Plans to Run Mostly Its Own Chips in AI Data Centers

  • Thread Author
Microsoft's CTO Kevin Scott told a packed audience at Italian Tech Week that Microsoft’s long-term plan is to run “mainly Microsoft chips” for its AI data centers — not out of ideology but out of control: control of cost, latency, and system design. That short sentence captures a sweeping shift in how one of the largest cloud providers intends to balance third‑party partnerships with in‑house engineering, from silicon and racks to cooling and networking. The comment is the latest public confirmation that Microsoft is pursuing a vertically integrated approach — shipping custom CPUs and AI accelerators, experimenting with radical cooling, and re‑architecting datacenters — while simultaneously wrestling with the practical limits of capacity, supply chains, and timelines.

Futuristic data center with technicians in AR headsets inspecting holographic server components.Background​

Why this matters now​

The AI compute landscape is defined by three simultaneous pressures: exploding demand from generative AI services, the economics of inference at scale, and physical limits on how much power a single rack or chip can dissipate. Microsoft’s statements and engineering moves are a play to manage all three variables at once. Designing a purpose‑built chip, customizing the rack and network, and improving cooling are not separate projects — they are system choices that can materially change price/performance and time‑to‑market for new features embedded in Windows, Microsoft 365 Copilot, Bing, and Azure AI services.
Microsoft’s public roadmap (and the surrounding reporting) shows three pillars in action:
  • Build first‑party silicon (Azure Cobalt 100 CPU and Azure Maia 100 AI accelerator).
  • Design the rest of the system (custom racks, power distribution, networking fabrics).
  • Invest in thermal innovations (in‑chip microfluidic cooling with Corintis) to unlock higher power density.
These are not theoretical bets — they are reflected in product launches, datacenter builds and executives’ public remarks. The central question for customers and IT leaders: can Microsoft make these components deliver lower cost and better latency, and do so reliably at hyperscale?

Microsoft’s chip strategy: pragmatic, not doctrinaire​

“We’re not religious about what the chips are”​

Kevin Scott’s message was pragmatic. Microsoft continues to buy and deploy GPUs from Nvidia and accelerators from AMD where they deliver the best price‑performance today. Yet the stated long‑term goal — to be running mostly Microsoft‑designed chips — is explicit and strategic: reduce supplier concentration risk, optimize the full stack for Microsoft workloads, and control economics for inference-heavy services. That pragmatic stance underpinned Scott’s comment that the company “will literally entertain anything” to secure capacity, but that the long-term aim is Microsoft silicon.
This mirrors industry trends where hyperscalers pursue both partnerships and internal engineering: buy best‑in‑class GPUs today, while building differentiated hardware for the workloads they expect to own tomorrow. The rationale is familiar:
  • Cost at volume: internal silicon amortizes differently at hyperscale, particularly for high‑volume inference.
  • Latency and integration: custom accelerators can be paired with networking and racks to reduce end‑to‑end latency.
  • Supply diversification: own silicon hedges against supplier shortages or sudden price swings.
However, designing chips is neither fast nor cheap; Microsoft has already acknowledged both engineering complexity and delayed timelines for next‑generation parts.

The hardware in question: Azure Maia and Azure Cobalt​

Maia 100 — what Microsoft announced and what it can plausibly deliver​

Microsoft announced the Azure Maia 100 AI accelerator at Ignite in November 2023. Official materials and subsequent technical posts describe a monolithic die built on TSMC’s 5 nm node with roughly 105 billion transistors, large on‑die SRAM, multi‑HBM stacks, and a design optimized for high‑throughput, low‑precision tensor math (the MX formats promoted by Microsoft). Microsoft published Maia as part of an end‑to‑end systems strategy — chip, server, rack, and software — intended to increase efficiency for the kinds of models Microsoft runs across Copilot and Azure OpenAI Services.
Public technical claims reported by press and Microsoft include:
  • 105 billion transistors on a 5 nm monolithic die.
  • Memory bandwidth and custom interconnects intended to support large model sharding at rack scale.
  • Performance figures quoted internally or in marketing: on the order of thousands of teraflops for low‑precision MX formats (company claims like 1,600 teraflops MXInt8 and 3,200 teraflops MXFP4 have been reported in trade press). Those numbers align with Microsoft’s marketing for Maia but should be treated as vendor figures until independent benchmarks appear.
This is an important practical nuance: Microsoft’s Maia 100 is real and deployed internally, but vendor performance claims require independent verification before they can be accepted as universal truths. The architecture and systems approach are credible; the exact, repeatable TFLOPS comparisons to competitor parts will need validation in third‑party testing.

Cobalt 100 CPU — an ARM‑based cloud CPU for general work​

Alongside Maia Microsoft launched the Azure Cobalt 100, an Arm‑based general‑purpose CPU designed for cloud workloads (128 cores, Neoverse N2 lineage). Microsoft positions Cobalt to run everything from control plane services to real‑world user workloads in Teams and Azure SQL, reducing dependence on third‑party x86 CPUs and providing a more integrated host for Maia accelerators. Expect incremental but meaningful improvements in price/performance for many workloads with Cobalt.

Maia 200: a pain point and timeline reality check​

In June 2025 reporting surfaced that Microsoft had pushed back mass production of its next‑generation Maia chip (reportedly codenamed Braga, later to be called Maia 200) into 2026. The delay was attributed to design changes, staff turnover, and integration issues. This is a familiar pattern in custom silicon programs: ambitious schedules meet real‑world engineering complexity. Microsoft’s roadmap therefore contains both immediate, deployed Maia 100 capacity and an aspirational follow‑on whose volume availability has shifted.
The practical implication for customers and partners is clear: Microsoft will continue to rely on Nvidia and AMD accelerators in the near term while expanding Maia 100 and other capacity. The promised future of “mainly Microsoft chips” is contingent on the Maia 200 roadmap, manufacturing yield, and whether the expected price/performance advantages materialize once amortized at hyperscale.

Cooling the problem: microfluidics and the Corintis partnership​

Why cooling is now a first‑order design decision​

At the power densities modern AI accelerators create, cooling is not an afterthought — it’s a gating factor. Traditional cold plates and air cooling approach practical limits as power per chip rises and rack densities increase. Microsoft’s labs have been testing in‑chip microfluidic cooling, etching microscopic coolant channels into or adjacent to silicon to take heat where it is produced rather than trying to beat thermal resistance from the outside. The company says prototype systems can remove heat up to three times better than state‑of‑the‑art cold plates and reduce peak silicon temperature rise by around 65% under test workloads. Those results are lab‑scale and promising, and Microsoft has publicly partnered with Swiss startup Corintis to accelerate the move from lab to production.
Multiple outlets have reported Microsoft’s bio‑inspired channel geometries (venation patterns) and AI‑driven topology optimization for the coolant pathways. The potential benefits are significant:
  • Increase rack density without proportionally increasing heat rejection infrastructure.
  • Enable higher sustained TDPs per accelerator (burst or continuous) and support future 3D stacked chip designs.
  • Reduce overall data center water consumption with closed‑loop designs, according to Microsoft lab claims.
But there are hard engineering and supply‑chain hurdles: reliability over thousands of in‑service servers, leak detection and mitigation, manufacturing yield at microfluidic precision, long‑term coolant chemistry and maintenance regimes, and standards development across the industry. Early test results are encouraging; large‑scale deployment will take time, validation, and ecosystem coordination.

The capacity crunch: scale, cost, and a fast‑moving market​

Microsoft’s capacity reality​

Executives at Microsoft, including Kevin Scott, have publicly acknowledged a “massive crunch” in available compute since the boom in generative AI following ChatGPT’s launch. Microsoft said it has “stood up” more than 2 GW of data center capacity in the prior 12 months, a figure used internally and cited on earnings calls as the company rapidly expanded infrastructure — even as it later rebalanced lease commitments in some markets. Those buildouts and adjustments reflect both raw demand for capacity and portfolio optimization between owned and leased facilities.
Across the industry, cloud providers have been buying or pre‑leasing huge swathes of capacity. The scale of these commitments is one reason Microsoft and peers are looking for any available lever — own silicon, higher density cooling, and tighter integration — to reduce per‑inference costs as volumes explode.

Financial consequences beyond engineering​

The compute land‑grab has a financing angle. Other cloud vendors have struck enormous deals to secure capacity — for example, reports in 2025 showed OpenAI’s multi‑year, multi‑billion cloud commitments shifting to firms like Oracle, prompting those firms to raise or prepare to raise capital. In Oracle’s case, press coverage described a multi‑year contract worth tens of billions and a subsequent large debt issuance (an $18 billion bond offering in late‑2025) and analyst commentary suggesting multi‑year financing needs in the tens of billions. These public financing moves illustrate that the capacity story is not just technical; it is a capital market challenge as well. Analysts have suggested large debt needs for companies taking major infrastructure risk against future AI revenue streams.

Strategic analysis — strengths, constraints, and risks​

Strengths in Microsoft’s approach​

  • Systems engineering focus: Microsoft is explicitly designing chip + rack + network + cooling as a single optimization problem. When done right, system co‑design can unlock orders of magnitude improvement in price‑performance for targeted workloads.
  • Vertical leverage across product stack: Microsoft can embed in‑house chips into Copilot, Windows features, and Azure services — capturing both product differentiation and long‑term economics.
  • Scale and purchasing power: Microsoft’s global datacenter footprint, procurement relationships, and capital base allow it to invest where others cannot and to absorb early production inefficiencies while refining designs.

Constraints and executional risks​

  • Chip development is slow and brittle: Custom silicon programs are complex. Maia 200’s production delay is a prime example: design changes, staffing churn, and simulation instability pushed volume schedules. Those delays mean reliance on Nvidia and others will continue for years.
  • Manufacturing and yield risk: A 105‑billion‑transistor monolithic die is difficult to manufacture at high yield. That drives per‑unit cost and can delay scale deployment.
  • Operational and maintenance risk for microfluidics: Embedded microchannels change server serviceability paradigms. Long‑term reliability, leak mitigation, and supply chain for microfluidic manufacturing must be proven at scale before production fleets can rely on them.
  • Vendor, partner and commercial tension: Building first‑party alternatives inevitably complicates partnerships. Microsoft’s strategic relationship with OpenAI remains important; diversification must be managed carefully to avoid undermining critical partner relationships.

Competitive and market implications​

Microsoft’s push forces a bifurcated landscape:
  • Hedging and orchestration: Enterprises will increasingly use multi‑vendor model stacks — first‑party Microsoft models when latency and costs matter, partner or frontier models when peak capability is needed.
  • Rising barriers to entry: Custom silicon + custom cooling + custom racks favors hyperscalers and capitalized incumbents, raising the economic entry bar for challengers.
  • New vendor opportunities: Companies that can provide manufacturing, cooling, or software orchestration for mixed model stacks stand to benefit. The microfluidic ecosystem, for instance, creates a new supplier category outside traditional cold‑plate makers.

What Windows admins and IT architects should do next​

  • Monitor model routing and governance controls in Microsoft services. Expect administrators to need fine‑grained policies that choose which models (Microsoft's MAI family vs partner models) process sensitive or regulated data.
  • Reassess capacity planning assumptions. If Microsoft’s microfluidic and Maia roadmaps pay off, higher rack densities and different cooling architectures could change co‑location economics and power provisioning strategies.
  • Plan for heterogeneous model stacks. Design observability and cost attribution tools now: routing requests to different models will produce disparate performance and billing profiles.
  • Pilot where latency matters. Organizations with millisecond SLAs (voice services, live transcription, real‑time assistants) should pilot Microsoft’s in‑house options as they become available and compare end‑to‑end latency against third‑party models.

Verifiable claims, uncertainties, and cautions​

  • Verifiable: Kevin Scott publicly stated Microsoft aims to use mainly its own chips in the future while continuing to rely on other vendors where they are best today. This was reported by CNBC and reiterated by Microsoft execs in public forums.
  • Verifiable: Microsoft’s Maia 100 and Cobalt 100 announcements (November 2023) and core technical descriptions are published in Microsoft materials and echoed in trade press. The published transistor count (≈105 billion) and 5 nm manufacturing base appear in Microsoft’s blog and follow‑up technical posts.
  • Verifiable: Maia 200 (aka Braga) mass‑production was reported delayed into 2026 in June 2025 reporting; that delay reflects real product schedule risk.
  • Verifiable: Microsoft’s microfluidic cooling prototypes and partnership with Corintis, and the lab claims of up to 3× heat removal and ~65% peak temp reduction, are described in Microsoft materials and covered by multiple outlets; these are promising lab results but not yet proven at hyperscale. Treat lab figures as preliminary until demonstrated in production fleets.
  • Unverifiable / needs independent confirmation: Absolute TFLOPS performance comparisons between Maia 100 and competitor chips should be treated cautiously. Vendor numbers are useful signals but not substitutes for vendor‑agnostic benchmark reports.

Bottom line​

Microsoft’s public roadmap and executive remarks show a clear strategic bet: the company intends to own more of the AI compute stack in pursuit of cost control, latency reductions, and product integration. That ambition has concrete manifestations today — Azure Maia 100 and Azure Cobalt 100 in production, system redesigns for racks and networks, and ambitious thermal research with partners like Corintis. At the same time, realistic constraints are visible: next‑gen chips have slipped timelines, cooling breakthroughs are still at prototype scale, and capacity economics remain a live debate across the cloud industry.
For enterprise architects and Windows administrators, the near term will remain hybrid: continue to rely on proven accelerators for hardest training jobs while testing Microsoft’s first‑party inference options where they reduce latency and cost. Over the medium term, the shape of datacenter hardware — from chip to coolant — looks set to change, and Microsoft’s choices will materially affect the economics and performance of AI services available to Windows and Azure customers.

Quick reference: what to watch next​

  • Maia 200 (Braga) production milestones and any published third‑party benchmarks.
  • Microsoft’s public field trials and reliability data for in‑chip microfluidic cooling.
  • Azure product announcements that differentiate Copilot/365 features powered by MAI family models vs partner models.
  • Datacenter capacity disclosures and leasing activity — signs Microsoft is increasing owned versus leased capacity.
  • Financial market moves tied to infrastructure contracts (bond issuance, analyst debt estimates) that reflect the capital intensity of the buildout.

Microsoft’s declaration — to “absolutely” prefer its own chips in the long run — is an architecture of intent, not a finish line. Achieving that vision requires successful chip deployment, reliable high‑density cooling, and the orchestration of racks and networks at gargantuan scale. If Microsoft succeeds, customers should see faster Copilot interactions, different Azure pricing dynamics, and a new generation of datacenter architecture. If execution slips, the company will continue to rely on best‑of‑breed third‑party accelerators while buying time to refine its silicon and systems. Either way, the era of treating chips as commodity components in the cloud is ending: the winners will be those who can align silicon, systems, and software into a predictable, economic whole.

Source: Data Center Dynamics Microsoft CTO claims company will mainly use its own AI chips in the future
 

Back
Top