Microsoft’s Foundry Agent Service has just widened its model catalog to include a broad set of third‑party and open‑weight models — from Anthropic’s latest Claude variants to DeepSeek’s R1 family, Meta/Mistral‑style Llama variants, and xAI’s Grok — and those models are exposed through Foundry’s SDK, API and the Foundry Agent Playground so developers can pick the best engine for each agentic workload.
Microsoft has been positioning Azure AI Foundry and the Foundry Agent Service as a multi‑vendor orchestration layer: a place to host, route and govern models at enterprise scale while exposing agent‑centric runtime features (streaming, tool calling, multimodality, grounded retrieval and automatic model routing). This move is part of a broader strategy to give enterprises model choice — enabling teams to trade latency, cost and reasoning capability per workflow rather than being locked to a single provider. Foundry’s catalog is organized to support different deployment and compliance needs: some models are sold and supported directly by Azure (with Microsoft product terms and SLAs), while partner/community models are available under the Foundry catalog, deployable as managed compute or serverless endpoints depending on provider policy. The platform also advertises orchestration features such as model routing, model cards, RBAC, auditing, and integration with existing Microsoft governance (Entra, Defender, Purview).
If the objective is to move from promising prototypes to reliable, auditable agent fleets, Microsoft’s Foundry Agent Service now gives teams the catalog and enterprise plumbing to try — but success will depend on disciplined benchmarking, hardened tool controls, and a clear mapping of compliance/contract expectations for each model in the catalog.
Source: Techzine Global Microsoft Foundry Agent Service offers a choice of more AI models
Background
Microsoft has been positioning Azure AI Foundry and the Foundry Agent Service as a multi‑vendor orchestration layer: a place to host, route and govern models at enterprise scale while exposing agent‑centric runtime features (streaming, tool calling, multimodality, grounded retrieval and automatic model routing). This move is part of a broader strategy to give enterprises model choice — enabling teams to trade latency, cost and reasoning capability per workflow rather than being locked to a single provider. Foundry’s catalog is organized to support different deployment and compliance needs: some models are sold and supported directly by Azure (with Microsoft product terms and SLAs), while partner/community models are available under the Foundry catalog, deployable as managed compute or serverless endpoints depending on provider policy. The platform also advertises orchestration features such as model routing, model cards, RBAC, auditing, and integration with existing Microsoft governance (Entra, Defender, Purview). What changed: the new models and what they mean
A mosaic of specialist strengths
Microsoft’s update added models intended to cover a wide spectrum of agent workloads:- Frontier reasoning and long‑horizon tasks: DeepSeek variants (DeepSeek‑R1‑0528, DeepSeek‑V3‑0324, V3.1) and xAI’s Grok family (Grok‑4 and task‑tuned variants). These are aimed at multi‑step reasoning and large‑context tasks.
- Anthropic Claude family: Claude‑Opus‑4‑1 for high‑complexity coding and extended reasoning; Claude‑Sonnet‑4‑5 for balanced multimodal/agentic workflows; Claude‑Haiku‑4‑5 for low‑latency interactive scenarios. Anthropic’s models bring long context windows and agent tooling support.
- Open‑ecosystem and FP8‑optimized options: Llama‑4‑Maverick‑17B‑128E‑Instruct‑FP8 for cost‑efficient inference and OpenAI’s new open‑weight releases (gpt‑oss‑120b and gpt‑oss‑20b) for transparency and local/edge deployment options.
- High‑throughput, task‑tuned variants: Grok‑4‑fast‑reasoning, grok‑4‑fast‑non‑reasoning and other task‑specialized shards are designed to trade some capability for throughput and cost.
Open weights + proprietary mixes
A notable shift is the presence of both closed‑weight commercial models and open‑weight alternatives — OpenAI’s gpt‑oss family is now available in Foundry alongside other community models. Microsoft and OpenAI’s public materials describe gpt‑oss‑120b as an 120B‑parameter, reasoning‑focused model optimized to run efficiently on a single 80GB GPU and intended to be API‑compatible with standard Responses / chat APIs. The smaller gpt‑oss‑20b targets local/edge scenarios with ~16GB VRAM requirements. Microsoft’s documentation highlights deployment options that include serverless and managed compute for these models. Caveat: while vendors publish capability claims and training cutoffs, some architectural numbers reported in secondary coverage (e.g., precise parameter counts or internal training recipes for third‑party models) are vendor‑reported or investigative; treat them as vendor claims unless independently validated. Several community threads mention DeepSeek parameter counts and raw architecture claims — those should be tested against provider system cards before being treated as fact.How Foundry surfaces these models: developer and operational features
SDK, API and Foundry Agent Playground
Microsoft exposes the new catalog across three primary developer surfaces:- SDKs (Python, TypeScript, C#) for embedding models in applications,
- REST APIs for production integration and serverless calls,
- Foundry Agent Playground for interactive comparisons, experimentation and rapid prototyping.
Streaming, tool calling and multimodality
Foundry Agent Service provides:- Streaming responses for low‑latency interactive UX,
- Flexible tool calling (OpenAPI, file search, code execution) so agents can interact with enterprise systems and compute safely,
- Multimodal inputs in many partner models (image+text) for document intelligence and chart/diagram analysis.
Grounded retrieval and Foundry IQ
Foundry’s knowledge and retrieval layer (Foundry IQ) is designed to automate retrieval‑augmented workflows so agents can access enterprise knowledge stores, plan multi‑step retrievals and iterate over evidence rather than returning hallucinated free‑form answers. The platform combines connectors for Microsoft 365, Fabric/OneLake and third‑party repositories and wraps retrieval with access controls and policy filtering.Automatic model routing
A marquee operational feature is model routing: a runtime service that selects the best model for a given request based on configured policies (Cost‑first / Balanced / Quality), live benchmarks, and constraints such as data residency or SLA. Foundry’s router exposes a single endpoint while arbitrating across the catalog, enabling dynamic tiering and shadow testing. Vendor‑reported numbers show the router can materially reduce costs and latency in some workloads; as always, enterprises should validate on their own prompts and datasets.How this compares to Google Vertex AI and the hyperscaler landscape
Google’s Vertex AI has been evolving in a similar direction: Model Garden, Partner and Open model catalogs, and Vertex Agent/Agent Engine tooling let Google surface Gemini and partner models (Anthropic, Mistral, DeepSeek and Llama variants) in a single runtime. Both clouds now offer multi‑vendor catalogs and agent SDKs, but there are meaningful differences in vendor exclusivity, billing and hosting terms.- Google’s Vertex generally hosts Google’s own models (Gemini family) under Google‑native SLAs and also provides partner/open models in its Model Garden, but some provider models remain exclusive to their native clouds.
- Microsoft’s Foundry emphasizes multi‑model orchestration under Azure governance and bills certain partner models via Azure consumption constructs, which can simplify enterprise procurement for customers already committed to Azure. The result is a similar breadth of model choice to Vertex in many respects, with differences driven by exclusive commercial placements and technical hosting agreements.
Enterprise implications: benefits and what to watch
Benefits
- Better cost‑fidelity matching: Teams can route routine queries to compact, cheap models while reserving frontier models for high‑value reasoning tasks, reducing overall inference spend.
- Single control plane for governance: Foundry integrates identity (Entra), telemetry (OpenTelemetry) and policy enforcement (Defender/Purview), which simplifies compliance and auditing for agent fleets.
- Faster prototyping to production: Playgrounds + SDKs + serverless endpoints let developers iterate quickly and then switch to production endpoints with governance intact.
- Choice reduces vendor lock‑in risk: Multi‑vendor catalogs give procurement and platform architects bargaining leverage and operational flexibility.
What to watch / risks
- Operational complexity: Supporting many model families increases the complexity of testing, evaluation and monitoring. Each model has distinct hallucination modes, latency profiles and tooling quirks; without disciplined benchmarking, routing can route to the wrong model for the job.
- Vendor claims vs. independent benchmarks: Many performance claims (e.g., throughput, percent gains in latency/cost, benchmark results) come from vendor tests. Independent, workload‑specific validation is essential before trusting those numbers in production.
- Data governance and contract nuance: Not all partner models are hosted the same way; some rely on provider‑hosted endpoints (which affects residency and contract terms) while others are Azure‑hosted. Admins must examine data processing, retention and MACC/contract implications for each model.
- Security surface area growth: More models + more tool integrations means more attack surface for prompt injection, credential misuse, or unintended data exfiltration. Foundry’s runtime protections and Defender integrations help, but they require operational investment to be effective.
- Ecosystem concentration and systemic risk: Big clouds hosting many frontier models increases efficiency but concentrates risk around a few large channels (compute, model IP, SLAs). That concentration can accelerate capability availability but also centralize outages, policy errors or regulatory scrutiny.
Practical guidance for IT and platform teams
- Start with a model inventory and objective testing plan.
- Catalog candidate models and define success metrics (accuracy, hallucination rate, latency, cost per 1,000 requests).
- Run shadow routing experiments.
- Use Foundry’s model router to shadow traffic to candidate models for 2–4 weeks; measure differences in quality and cost before shifting production traffic.
- Define data processing and residency policies per model.
- Flag models that use provider‑hosted endpoints vs. Azure‑hosted deployments; map these to compliance lanes and procurement requirements.
- Harden tool calling.
- Use least‑privilege tool permits, runtime payload scanning and Defender/third‑party runtime hooks to block risky actions before they execute.
- Create an agent lifecycle process.
- Treat agents as directory objects (Entra Agent ID), define owners, and enforce automated deprovisioning and access reviews. Observability and traceability must be part of every agent’s SLO.
Technical specs and verifications (what’s been confirmed)
- Microsoft’s Foundry model catalog lists DeepSeek, Grok (grok‑4 and optimized variants), Llama‑4‑Maverick‑17B‑128E and other partner models as globally available under specified deployment types. The Microsoft model pages and blog confirm names and deployment options.
- Anthropic’s Claude‑Opus‑4‑1 and other Claude variants are published with explicit capabilities, context window details and training cutoffs; these models are available via Foundry’s partner/community model channels.
- OpenAI announced gpt‑oss (gpt‑oss‑120b and gpt‑oss‑20b) as open‑weight models and Microsoft Azure has documented gpt‑oss availability and supported capabilities, including context window sizes and deployment notes.
- Foundry Agent Service supports streaming responses, flexible tool calling, multimodal inputs and a model router that can automatically select models based on performance/cost/accuracy tradeoffs; those features are documented in Microsoft’s Foundry blog and product pages.
Competitive & market context: why this matters now
The industry is moving from “single assistant” platforms to multi‑model orchestration and agents as first‑class enterprise services. The hyperscalers are racing to solve these three operational problems simultaneously:- Provide a broad, curated model catalog so teams can pick the right engine per task.
- Expose agent‑native runtimes with tool integrations, memory and governance.
- Make model selection operationally safe and cost‑predictable via routing, observability and procurement alignment.
Conclusion
The expansion of Microsoft Foundry Agent Service’s model catalog is a milestone for enterprise AI: it formalizes model choice inside a governed runtime, adds partner frontier models (Anthropic, DeepSeek, xAI/Grok) and open‑weight options (gpt‑oss) and wraps them in the agent primitives that enterprises need — streaming, tools, multimodality, retrieval and model routing. These capabilities make it practical to architect agents that combine speed, cost efficiency and deep reasoning while staying inside established governance boundaries. That said, the new landscape raises well‑understood operational challenges: multi‑model testing overhead, verifying vendor claims on real workloads, careful contract mapping for data residency and a sharpened need for runtime protections. Organizations should pilot with a strong measurement plan, use shadow routing before flipping production traffic, and treat agent governance and identity as first‑order problems rather than add‑ons.If the objective is to move from promising prototypes to reliable, auditable agent fleets, Microsoft’s Foundry Agent Service now gives teams the catalog and enterprise plumbing to try — but success will depend on disciplined benchmarking, hardened tool controls, and a clear mapping of compliance/contract expectations for each model in the catalog.
Source: Techzine Global Microsoft Foundry Agent Service offers a choice of more AI models