Microsoft has added Anthropic’s latest flagship, Claude Opus 4.5, to its Microsoft Foundry public preview and pushed the model into several Microsoft developer touchpoints — including GitHub Copilot paid plans and Microsoft Copilot Studio — marking a major step in making frontier third‑party models available to Azure customers with built‑in governance, deployment controls, and developer integrations.
Microsoft’s move follows Anthropic’s formal launch of Claude Opus 4.5 on November 24, 2025, and continues an accelerating trend: cloud providers and enterprise platforms are offering customers multiple frontier models side‑by‑side so organizations can pick the best engine for each workload. Microsoft has also repositioned the product formerly called Azure AI Foundry under the new Microsoft Foundry brand, presenting it as an integrated platform for deploying, scaling, and governing AI applications and agents across an enterprise estate.
The Foundry announcement and Anthropic’s product materials present Opus 4.5 as a model optimized for coding, agentic workflows, multimodal “computer use,” and enterprise productivity. Microsoft emphasizes that placing Claude models next to GPT models in Foundry gives customers choice when engineering production agents — a step toward heterogenous model strategies that balance cost, performance, and trust requirements.
That said, several caveats matter:
It’s important to treat initial region rollout and public preview status as limiting factors: enterprises that require on‑premises or non‑public cloud deployments should plan for a broader availability schedule or evaluate fallback models in Foundry’s catalog.
However, the practical value of Opus 4.5 will depend on careful engineering: benchmarking on real workloads, building verification layers, and enforcing strong governance. The vendor claims on benchmarks, cost, and safety are consistent across Anthropic and Microsoft announcements and have been widely reported by independent outlets, but they should be validated against your organization’s unique datasets and operational constraints.
That promise comes with responsibilities: rigorous in‑house validation, production‑grade safety engineering, and continuous monitoring. Organizations that invest in those practices will be best positioned to turn Opus 4.5’s agentic capabilities into reliable, auditable, and cost‑effective automation across engineering, finance, security, and operations.
Source: Visual Studio Magazine Microsoft Brings Anthropic's Claude Opus 4.5 to Foundry Preview -- Visual Studio Magazine
Background
Microsoft’s move follows Anthropic’s formal launch of Claude Opus 4.5 on November 24, 2025, and continues an accelerating trend: cloud providers and enterprise platforms are offering customers multiple frontier models side‑by‑side so organizations can pick the best engine for each workload. Microsoft has also repositioned the product formerly called Azure AI Foundry under the new Microsoft Foundry brand, presenting it as an integrated platform for deploying, scaling, and governing AI applications and agents across an enterprise estate.The Foundry announcement and Anthropic’s product materials present Opus 4.5 as a model optimized for coding, agentic workflows, multimodal “computer use,” and enterprise productivity. Microsoft emphasizes that placing Claude models next to GPT models in Foundry gives customers choice when engineering production agents — a step toward heterogenous model strategies that balance cost, performance, and trust requirements.
What Claude Opus 4.5 is — the headline capabilities
Claude Opus 4.5 is presented as an incremental yet meaningful evolution of Anthropic’s Claude 4 family with several specific technical and product claims:- Hybrid reasoning model designed to be strong at long‑horizon tasks, planning, and multi‑step execution.
- Large working memory (a standard 200K token context window for most users, with higher context options accessible in enterprise tiers).
- Better code generation and agentic tooling: Anthropic and Microsoft report improvements on real‑world software engineering benchmarks and terminal‑style agent tasks.
- Programmatic tool calling (direct execution via Python), tool search for dynamic discovery in large tool libraries, and schema‑aware tool examples to improve accuracy in complex integrations.
- An “Effort Parameter” (beta) to tune how much compute the model devotes to thinking versus tool calls and responses, enabling tradeoffs between latency, quality, and cost.
- Compaction control SDK helpers intended to make long‑running agent sessions more predictable by managing and shrinking context footprints over time.
- Stronger vision and computer‑use capabilities for automating multi‑step desktop tasks (spreadsheets, presentations, document composition), with improved memory and cross‑file context.
Technical specifications and verifiable claims
The major technical points that can be verified from vendor documentation and the Foundry announcement include:- Context window: Opus 4.5 ships with a 200,000 token context window in standard tiers, with extended context options (500K–1M tokens) available under specific enterprise or beta programs. This is a key enabler for long‑running agent sessions and multi‑document work. The context behavior also includes model‑level context awareness features so agents can reason about remaining token budget.
- Pricing (serverless, pay‑as‑you‑go Foundry offering): Microsoft lists $5 per million input tokens and $25 per million output tokens for Claude Opus 4.5 in Foundry’s serverless offering, with regional availability initially noted in East US 2 and Sweden Central for the global standard deployment.
- Benchmark numbers reported by Anthropic and reproduced in platform materials: Opus 4.5 is reported to score 80.9% on SWE‑bench Verified, outpacing prior Opus/Sonnet models and several competing frontier models on that specific coding benchmark. Additional benchmark improvements are cited for agentic terminal coding (Terminal‑bench 2.0), scaled tool use (MCP Atlas), and OSWorld computer‑use tests.
What’s new for Foundry and enterprise customers
Microsoft’s announcement doesn’t just add a model to a catalog — it bundles Opus 4.5 with platform capabilities intended to make agentic AI safer and more operationally manageable:- Effort Parameter (beta): This control is novel in that it gives developers an API‑level knob to adjust how much internal compute the model invests in thinking relative to producing tokens and executing tool calls. In practice, this can help teams optimize latency vs. quality tradeoffs for different agent classes (e.g., interactive chat vs. long‑running refactoring jobs).
- Compaction Control helpers in the Foundry SDK: Long‑running agents tend to blow past token budgets; SDK‑level helpers for compaction aim to compress or summarize prior context so sessions remain coherent without exceeding the window.
- Programmatic tool calling + tool search: Calling tools directly (for example, executing a function in Python) reduces brittle text‑parsing glue that often breaks in production. Tool search lets agents discover relevant tools dynamically — particularly valuable when an enterprise exposes hundreds or thousands of internal microservices or connectors.
- Governance and observability: Foundry pairs the model with Microsoft’s enterprise governance stack — centralized policies, security controls, and telemetry — enabling compliance, auditing, and monitoring for agentic workflows.
Benchmarks and performance: reading the numbers with context
Opus 4.5’s headline metric — 80.9% on SWE‑bench Verified — is a notable milestone because SWE‑bench Verified uses real GitHub issues as test cases rather than synthetic puzzles. Vendors and journalists are treating a >80% result as a symbolic barrier for models handling practical software engineering fixes.That said, several caveats matter:
- Benchmarks are not apples‑to‑apples across providers unless every test detail (prompt templates, tool access, evaluation harness, reranking) is identical. Vendors may report internal scores measured under specific conditions (e.g., with additional verification or reruns).
- Gains are most pronounced on tasks that require reasoning across files, planning multi‑step changes, and using tools — precisely the scenarios Opus 4.5 was engineered for. On simpler completions or single‑file tasks, differences between top models may be marginal.
- Reported improvements in token efficiency (fewer tokens to reach the same answer) are meaningful for cost-sensitive deployments but depend on the precise usage patterns: chatty interactive flows vs. hands‑off batch jobs behave very differently.
Multimodality, “computer use,” and productivity agents
Anthropic and Microsoft describe Opus 4.5 as the company’s best vision model to date — a claim that covers two practical improvements:- Improved visual reasoning: Better interpretation of screenshots, diagrams, and multi‑page documents, enabling higher‑fidelity automation of desktop tasks.
- Improved “computer use” automation: More robust, multi‑step desktop workflows — creating spreadsheets with conditional logic, building slide decks from research notes, or stitching multi‑document reports together with consistent domain knowledge.
Pricing, availability, and developer access
Microsoft lists Opus 4.5 availability as public preview in Foundry, with entry in GitHub Copilot paid plans and in Copilot Studio. The specific pricing tiers for Foundry’s serverless offering are:- $5 per million input tokens
- $25 per million output tokens
It’s important to treat initial region rollout and public preview status as limiting factors: enterprises that require on‑premises or non‑public cloud deployments should plan for a broader availability schedule or evaluate fallback models in Foundry’s catalog.
Safety, security, and governance implications
Both Anthropic and Microsoft emphasize safety improvements for Opus 4.5 — lower rates of misaligned responses, improved robustness to prompt‑injection attacks, and more predictable behavior on complex tasks. From an enterprise perspective, these are welcome but not dispositive:- Safety improvements are relative: No model is immune to hallucination, data leakage, or adversarial prompts. Reducing misalignment rates is valuable, but organizations still need layered defenses: input sanitization, red team testing, policy enforcement, and human‑in‑the‑loop gates for high‑risk decisions.
- Prompt injection and tool safety: Programmatic tool calling is double‑edged. Deterministic calling reduces fragile text parsing, but it also raises the surface area for dangerous automation if tool permissions and scopes are not tightly constrained.
- Data residency and compliance: Foundry’s regional deployments and Microsoft’s governance stack help with regulatory requirements, but organizations with strict residency needs or isolated air‑gapped environments must verify whether the model and telemetry flows meet policy constraints.
- Model provenance and supply‑chain risk: Third‑party models introduce supply chain considerations — who trains the weights, where data flowed, and what contractual protections exist. Enterprises should require transparency and contractual SLAs around data handling and security.
Real‑world use cases and where Opus 4.5 shines
Microsoft and Anthropic outline enterprise scenarios where Opus 4.5 is likely to deliver the most value:- Software development agents: Autonomous or semi‑autonomous agents that perform code migration, multi‑file refactoring, bug triage, and CI pipeline orchestration with minimal supervision.
- Finance and analytics agents: Systems that synthesize filings, internal reports, and market data into actionable models or risk assessments, with the ability to call internal tools for validation.
- Cybersecurity orchestration: Agents that can correlate logs, enrich with threat intelligence, and execute multi‑step playbooks. In cybersecurity, explainability and auditable tool calls are crucial.
- Operational automation: Agents that coordinate workflows across ticketing systems, ERPs, and collaboration platforms — particularly when hundreds of tools need to be discovered and orchestrated.
Developer experience: in‑editor workflows and Copilot
One of Microsoft’s strategic plays is getting models into the developer’s flow: GitHub Copilot, Visual Studio Code, and Copilot Studio. Early integration means developers will be able to:- Select Opus 4.5 as a model in the Copilot model picker (when tenant admins enable Anthropic models).
- Use Opus 4.5‑powered agents in VS Code for planning, agentic modes (Plan/Agent/Edit), and complex codebase modifications.
- Build enterprise agents in Copilot Studio and evaluate model choice side‑by‑side using the platform’s evaluation tooling.
Risks, limitations, and operational considerations
Despite the promise, several pragmatic risks must be weighed:- Benchmark vs. production delta: Vendor benchmarks are helpful but rarely capture every nuance of your environment. Plan an internal evaluation pipeline and test Opus 4.5 on representative repositories, datasets, and toolchains before scaling.
- Unexpected agent behavior: Agentic systems that operate across many tools can create cascading failures if an action is executed without adequate checks. Implement dry‑runs, permission scoping, and manual approvals for high‑impact operations.
- Cost tail risk: Large context windows, extended thinking, and high throughput can create surprising token bills if effort parameters and quotas aren’t tightly managed. Use cost controls, quotas, and telemetry to detect runaway sessions early.
- Vendor and model diversity: Reliance on a single third‑party model or cloud provider increases business risk. Foundry’s multi‑model strategy helps, but teams should design for failover and model substitution where feasible.
- Privacy and data governance: Verify how prompt data, tool call payloads, and telemetry are logged and retained. Sensitive data (PII, proprietary algorithms) should be redacted or processed under strict contractual protections.
Practical advice for teams evaluating Opus 4.5 in Foundry
- Start with a scoped pilot: pick one high‑impact repository or workflow and define clear success metrics (accuracy, time saved, token cost).
- Run comparative A/B tests: evaluate Opus 4.5 against Sonnet 4.5 and your existing models on the same tasks and tool‑enabled flows.
- Use the Effort Parameter in controlled experiments: measure latency, quality, and token consumption across Low/Medium/High settings to find the operational sweet spot.
- Implement guardrails for tool calling: require least‑privilege credentials, enable audit logging for each tool call, and create human approval steps for destructive actions.
- Monitor and cap costs: set token quotas per agent, alert on usage anomalies, and simulate worst‑case cost scenarios.
- Design for graceful degradation: if an agent cannot confidently complete a task, route to human specialists rather than auto‑executing risky actions.
Strategic takeaways
Microsoft’s integration of Claude Opus 4.5 into Foundry and Copilot channels is a concrete example of how cloud providers are enabling model choice and adding operational controls to make frontier models production‑ready. The combination of Opus 4.5’s agentic improvements, programmatic tool calling, and Foundry’s governance and SDK helpers is meaningful — particularly for enterprises that require deterministic automation across many internal tools.However, the practical value of Opus 4.5 will depend on careful engineering: benchmarking on real workloads, building verification layers, and enforcing strong governance. The vendor claims on benchmarks, cost, and safety are consistent across Anthropic and Microsoft announcements and have been widely reported by independent outlets, but they should be validated against your organization’s unique datasets and operational constraints.
Conclusion
Claude Opus 4.5’s arrival in Microsoft Foundry signals a maturation in how enterprises will adopt agentic AI: models are now offered as part of platforms that provide governance, observability, and operational knobs rather than as standalone black‑box APIs. For teams building developer agents, cybersecurity playbooks, or productivity automations, Opus 4.5 appears to deliver meaningful advances in planning, tool use, and code generation — and it does so at price points that Microsoft and Anthropic present as significantly more accessible than prior Opus‑class offerings.That promise comes with responsibilities: rigorous in‑house validation, production‑grade safety engineering, and continuous monitoring. Organizations that invest in those practices will be best positioned to turn Opus 4.5’s agentic capabilities into reliable, auditable, and cost‑effective automation across engineering, finance, security, and operations.
Source: Visual Studio Magazine Microsoft Brings Anthropic's Claude Opus 4.5 to Foundry Preview -- Visual Studio Magazine