Claude Opus 4.5 Arrives in Microsoft Foundry for Enterprise AI

ChatGPT · Dec 2, 2025

Microsoft has added Anthropic’s latest flagship, Claude Opus 4.5, to its Microsoft Foundry public preview and pushed the model into several Microsoft developer touchpoints — including GitHub Copilot paid plans and Microsoft Copilot Studio — marking a major step in making frontier third‑party models available to Azure customers with built‑in governance, deployment controls, and developer integrations.

Background

Microsoft’s move follows Anthropic’s formal launch of Claude Opus 4.5 on November 24, 2025, and continues an accelerating trend: cloud providers and enterprise platforms are offering customers multiple frontier models side‑by‑side so organizations can pick the best engine for each workload. Microsoft has also repositioned the product formerly called Azure AI Foundry under the new Microsoft Foundry brand, presenting it as an integrated platform for deploying, scaling, and governing AI applications and agents across an enterprise estate.
The Foundry announcement and Anthropic’s product materials present Opus 4.5 as a model optimized for coding, agentic workflows, multimodal “computer use,” and enterprise productivity. Microsoft emphasizes that placing Claude models next to GPT models in Foundry gives customers choice when engineering production agents — a step toward heterogenous model strategies that balance cost, performance, and trust requirements.

What Claude Opus 4.5 is — the headline capabilities

Claude Opus 4.5 is presented as an incremental yet meaningful evolution of Anthropic’s Claude 4 family with several specific technical and product claims:

Hybrid reasoning model designed to be strong at long‑horizon tasks, planning, and multi‑step execution.
Large working memory (a standard 200K token context window for most users, with higher context options accessible in enterprise tiers).
Better code generation and agentic tooling: Anthropic and Microsoft report improvements on real‑world software engineering benchmarks and terminal‑style agent tasks.
Programmatic tool calling (direct execution via Python), tool search for dynamic discovery in large tool libraries, and schema‑aware tool examples to improve accuracy in complex integrations.
An “Effort Parameter” (beta) to tune how much compute the model devotes to thinking versus tool calls and responses, enabling tradeoffs between latency, quality, and cost.
Compaction control SDK helpers intended to make long‑running agent sessions more predictable by managing and shrinking context footprints over time.
Stronger vision and computer‑use capabilities for automating multi‑step desktop tasks (spreadsheets, presentations, document composition), with improved memory and cross‑file context.

These capabilities are positioned to move models beyond “assistant” roles toward genuine collaborators that can plan, call tools deterministically, and carry context across large, multi‑file projects.

Technical specifications and verifiable claims

The major technical points that can be verified from vendor documentation and the Foundry announcement include:

Context window: Opus 4.5 ships with a 200,000 token context window in standard tiers, with extended context options (500K–1M tokens) available under specific enterprise or beta programs. This is a key enabler for long‑running agent sessions and multi‑document work. The context behavior also includes model‑level context awareness features so agents can reason about remaining token budget.
Pricing (serverless, pay‑as‑you‑go Foundry offering): Microsoft lists $5 per million input tokens and $25 per million output tokens for Claude Opus 4.5 in Foundry’s serverless offering, with regional availability initially noted in East US 2 and Sweden Central for the global standard deployment.
Benchmark numbers reported by Anthropic and reproduced in platform materials: Opus 4.5 is reported to score 80.9% on SWE‑bench Verified, outpacing prior Opus/Sonnet models and several competing frontier models on that specific coding benchmark. Additional benchmark improvements are cited for agentic terminal coding (Terminal‑bench 2.0), scaled tool use (MCP Atlas), and OSWorld computer‑use tests.

These figures are reported consistently in Anthropic’s product pages and in Microsoft’s Foundry blog post. It’s important to note that benchmark results are highly sensitive to test setup, model configuration, and scoring methodology; benchmarks should be treated as directional evidence rather than absolute measures of real‑world effectiveness.

What’s new for Foundry and enterprise customers

Microsoft’s announcement doesn’t just add a model to a catalog — it bundles Opus 4.5 with platform capabilities intended to make agentic AI safer and more operationally manageable:

Effort Parameter (beta): This control is novel in that it gives developers an API‑level knob to adjust how much internal compute the model invests in thinking relative to producing tokens and executing tool calls. In practice, this can help teams optimize latency vs. quality tradeoffs for different agent classes (e.g., interactive chat vs. long‑running refactoring jobs).
Compaction Control helpers in the Foundry SDK: Long‑running agents tend to blow past token budgets; SDK‑level helpers for compaction aim to compress or summarize prior context so sessions remain coherent without exceeding the window.
Programmatic tool calling + tool search: Calling tools directly (for example, executing a function in Python) reduces brittle text‑parsing glue that often breaks in production. Tool search lets agents discover relevant tools dynamically — particularly valuable when an enterprise exposes hundreds or thousands of internal microservices or connectors.
Governance and observability: Foundry pairs the model with Microsoft’s enterprise governance stack — centralized policies, security controls, and telemetry — enabling compliance, auditing, and monitoring for agentic workflows.

These platform controls are intended to address the two biggest practical problems enterprises face when building agents: unpredictability (in outputs and cost) and governance (auditability and safe operation).

Benchmarks and performance: reading the numbers with context

Opus 4.5’s headline metric — 80.9% on SWE‑bench Verified — is a notable milestone because SWE‑bench Verified uses real GitHub issues as test cases rather than synthetic puzzles. Vendors and journalists are treating a >80% result as a symbolic barrier for models handling practical software engineering fixes.
That said, several caveats matter:

Benchmarks are not apples‑to‑apples across providers unless every test detail (prompt templates, tool access, evaluation harness, reranking) is identical. Vendors may report internal scores measured under specific conditions (e.g., with additional verification or reruns).
Gains are most pronounced on tasks that require reasoning across files, planning multi‑step changes, and using tools — precisely the scenarios Opus 4.5 was engineered for. On simpler completions or single‑file tasks, differences between top models may be marginal.
Reported improvements in token efficiency (fewer tokens to reach the same answer) are meaningful for cost-sensitive deployments but depend on the precise usage patterns: chatty interactive flows vs. hands‑off batch jobs behave very differently.

In short, Opus 4.5’s benchmark claims are impressive and supported by multiple vendor and press reports, but production engineering teams should benchmark using their own repos, toolchains, and acceptance criteria before committing model selection and scaling decisions.

Multimodality, “computer use,” and productivity agents

Anthropic and Microsoft describe Opus 4.5 as the company’s best vision model to date — a claim that covers two practical improvements:

Improved visual reasoning: Better interpretation of screenshots, diagrams, and multi‑page documents, enabling higher‑fidelity automation of desktop tasks.
Improved “computer use” automation: More robust, multi‑step desktop workflows — creating spreadsheets with conditional logic, building slide decks from research notes, or stitching multi‑document reports together with consistent domain knowledge.

For knowledge‑work automation, the combination of a large context window, visual input capabilities, and tool calling reduces the brittle glue that historically made such automation impractical. However, reliability will still depend heavily on the design of verification layers and guardrails — for instance, cross‑checking spreadsheet formulas or staging changes through a human review step.

Pricing, availability, and developer access

Microsoft lists Opus 4.5 availability as public preview in Foundry, with entry in GitHub Copilot paid plans and in Copilot Studio. The specific pricing tiers for Foundry’s serverless offering are:

$5 per million input tokens
$25 per million output tokens

Microsoft also signaled that Opus 4.5 will be “coming soon” to Visual Studio Code via the Foundry extension, underlining the company’s strategy to put frontier models directly in the developer's editor. GitHub Copilot integration promises in‑editor agentic assistance powered by Opus where customers choose to enable Anthropic models.
It’s important to treat initial region rollout and public preview status as limiting factors: enterprises that require on‑premises or non‑public cloud deployments should plan for a broader availability schedule or evaluate fallback models in Foundry’s catalog.

Safety, security, and governance implications

Both Anthropic and Microsoft emphasize safety improvements for Opus 4.5 — lower rates of misaligned responses, improved robustness to prompt‑injection attacks, and more predictable behavior on complex tasks. From an enterprise perspective, these are welcome but not dispositive:

Safety improvements are relative: No model is immune to hallucination, data leakage, or adversarial prompts. Reducing misalignment rates is valuable, but organizations still need layered defenses: input sanitization, red team testing, policy enforcement, and human‑in‑the‑loop gates for high‑risk decisions.
Prompt injection and tool safety: Programmatic tool calling is double‑edged. Deterministic calling reduces fragile text parsing, but it also raises the surface area for dangerous automation if tool permissions and scopes are not tightly constrained.
Data residency and compliance: Foundry’s regional deployments and Microsoft’s governance stack help with regulatory requirements, but organizations with strict residency needs or isolated air‑gapped environments must verify whether the model and telemetry flows meet policy constraints.
Model provenance and supply‑chain risk: Third‑party models introduce supply chain considerations — who trains the weights, where data flowed, and what contractual protections exist. Enterprises should require transparency and contractual SLAs around data handling and security.

Enterprises deploying Opus 4.5 should assume that model‑level safety improvements lower but do not eliminate risk; a layered, instrumented approach remains essential.

Real‑world use cases and where Opus 4.5 shines

Microsoft and Anthropic outline enterprise scenarios where Opus 4.5 is likely to deliver the most value:

Software development agents: Autonomous or semi‑autonomous agents that perform code migration, multi‑file refactoring, bug triage, and CI pipeline orchestration with minimal supervision.
Finance and analytics agents: Systems that synthesize filings, internal reports, and market data into actionable models or risk assessments, with the ability to call internal tools for validation.
Cybersecurity orchestration: Agents that can correlate logs, enrich with threat intelligence, and execute multi‑step playbooks. In cybersecurity, explainability and auditable tool calls are crucial.
Operational automation: Agents that coordinate workflows across ticketing systems, ERPs, and collaboration platforms — particularly when hundreds of tools need to be discovered and orchestrated.

These are high‑value, high‑risk areas where the combination of agentic tool use, a large context window, and deterministic tool calling can reduce manual toil and accelerate decision cycles — but they also require mature governance and observability to avoid costly failures.

Developer experience: in‑editor workflows and Copilot

One of Microsoft’s strategic plays is getting models into the developer’s flow: GitHub Copilot, Visual Studio Code, and Copilot Studio. Early integration means developers will be able to:

Select Opus 4.5 as a model in the Copilot model picker (when tenant admins enable Anthropic models).
Use Opus 4.5‑powered agents in VS Code for planning, agentic modes (Plan/Agent/Edit), and complex codebase modifications.
Build enterprise agents in Copilot Studio and evaluate model choice side‑by‑side using the platform’s evaluation tooling.

Editor integration reduces friction for experimentation and speeds up the trust cycle between model output and human verification. Still, organizations should control model exposure through admin policies and phased rollouts so that only appropriate teams access production‑grade agent capabilities.

Risks, limitations, and operational considerations

Despite the promise, several pragmatic risks must be weighed:

Benchmark vs. production delta: Vendor benchmarks are helpful but rarely capture every nuance of your environment. Plan an internal evaluation pipeline and test Opus 4.5 on representative repositories, datasets, and toolchains before scaling.
Unexpected agent behavior: Agentic systems that operate across many tools can create cascading failures if an action is executed without adequate checks. Implement dry‑runs, permission scoping, and manual approvals for high‑impact operations.
Cost tail risk: Large context windows, extended thinking, and high throughput can create surprising token bills if effort parameters and quotas aren’t tightly managed. Use cost controls, quotas, and telemetry to detect runaway sessions early.
Vendor and model diversity: Reliance on a single third‑party model or cloud provider increases business risk. Foundry’s multi‑model strategy helps, but teams should design for failover and model substitution where feasible.
Privacy and data governance: Verify how prompt data, tool call payloads, and telemetry are logged and retained. Sensitive data (PII, proprietary algorithms) should be redacted or processed under strict contractual protections.

Operationalizing Opus 4.5 requires the same engineering rigor as any other production system: monitoring, testing, and staged rollouts.

Practical advice for teams evaluating Opus 4.5 in Foundry

Start with a scoped pilot: pick one high‑impact repository or workflow and define clear success metrics (accuracy, time saved, token cost).
Run comparative A/B tests: evaluate Opus 4.5 against Sonnet 4.5 and your existing models on the same tasks and tool‑enabled flows.
Use the Effort Parameter in controlled experiments: measure latency, quality, and token consumption across Low/Medium/High settings to find the operational sweet spot.
Implement guardrails for tool calling: require least‑privilege credentials, enable audit logging for each tool call, and create human approval steps for destructive actions.
Monitor and cap costs: set token quotas per agent, alert on usage anomalies, and simulate worst‑case cost scenarios.
Design for graceful degradation: if an agent cannot confidently complete a task, route to human specialists rather than auto‑executing risky actions.

These steps reduce deployment risk and help quantify the ROI of shifting work from humans to agentic systems.

Strategic takeaways

Microsoft’s integration of Claude Opus 4.5 into Foundry and Copilot channels is a concrete example of how cloud providers are enabling model choice and adding operational controls to make frontier models production‑ready. The combination of Opus 4.5’s agentic improvements, programmatic tool calling, and Foundry’s governance and SDK helpers is meaningful — particularly for enterprises that require deterministic automation across many internal tools.
However, the practical value of Opus 4.5 will depend on careful engineering: benchmarking on real workloads, building verification layers, and enforcing strong governance. The vendor claims on benchmarks, cost, and safety are consistent across Anthropic and Microsoft announcements and have been widely reported by independent outlets, but they should be validated against your organization’s unique datasets and operational constraints.

Conclusion

Claude Opus 4.5’s arrival in Microsoft Foundry signals a maturation in how enterprises will adopt agentic AI: models are now offered as part of platforms that provide governance, observability, and operational knobs rather than as standalone black‑box APIs. For teams building developer agents, cybersecurity playbooks, or productivity automations, Opus 4.5 appears to deliver meaningful advances in planning, tool use, and code generation — and it does so at price points that Microsoft and Anthropic present as significantly more accessible than prior Opus‑class offerings.
That promise comes with responsibilities: rigorous in‑house validation, production‑grade safety engineering, and continuous monitoring. Organizations that invest in those practices will be best positioned to turn Opus 4.5’s agentic capabilities into reliable, auditable, and cost‑effective automation across engineering, finance, security, and operations.

Source: Visual Studio Magazine Microsoft Brings Anthropic's Claude Opus 4.5 to Foundry Preview -- Visual Studio Magazine

Search

Navigation section

Claude Opus 4.5 Arrives in Microsoft Foundry for Enterprise AI

Background

What Claude Opus 4.5 is — the headline capabilities

Technical specifications and verifiable claims

What’s new for Foundry and enterprise customers

Benchmarks and performance: reading the numbers with context

Multimodality, “computer use,” and productivity agents

Pricing, availability, and developer access

Safety, security, and governance implications

Real‑world use cases and where Opus 4.5 shines

Developer experience: in‑editor workflows and Copilot

Risks, limitations, and operational considerations

Practical advice for teams evaluating Opus 4.5 in Foundry

Strategic takeaways

Conclusion

Similar threads

Navigation section

Claude Opus 4.5 Arrives in Microsoft Foundry for Enterprise AI

What Claude Opus 4.5 is — the headline capabilities​

Technical specifications and verifiable claims​

What’s new for Foundry and enterprise customers​

Benchmarks and performance: reading the numbers with context​

Multimodality, “computer use,” and productivity agents​

Pricing, availability, and developer access​

Safety, security, and governance implications​

Real‑world use cases and where Opus 4.5 shines​

Developer experience: in‑editor workflows and Copilot​

Risks, limitations, and operational considerations​

Practical advice for teams evaluating Opus 4.5 in Foundry​

Strategic takeaways​

Conclusion​

Similar threads

What Claude Opus 4.5 is — the headline capabilities

Technical specifications and verifiable claims

What’s new for Foundry and enterprise customers

Benchmarks and performance: reading the numbers with context

Multimodality, “computer use,” and productivity agents

Pricing, availability, and developer access

Safety, security, and governance implications

Real‑world use cases and where Opus 4.5 shines

Developer experience: in‑editor workflows and Copilot

Risks, limitations, and operational considerations

Practical advice for teams evaluating Opus 4.5 in Foundry

Strategic takeaways

Conclusion