Microsoft’s September update for Copilot Studio pushes the platform from “authoring copilots” toward a full enterprise-grade agent runtime — adding UI automation, richer channel deployment, testing tools, code execution, lifecycle tooling, and tighter admin controls that together make Copilot Studio a serious candidate for production automation across customer support, back‑office, and developer scenarios.
Copilot Studio sits inside the Power Platform as Microsoft’s low‑code environment for building, tuning, and deploying AI agents that operate on tenant data, connectors, and business systems. Over the past year Microsoft has layered governance (Entra Agent ID, Purview, DLP), retrieval grounding, and runtime telemetry into the product to support enterprise rollout. The September wave expands that platform with features targeting three practical gaps: (1) interacting with UIs where APIs don’t exist, (2) shipping agents to end‑user channels and native apps, and (3) giving makers the testing and operational tools needed to scale safely.
Caveats and operational notes:
Why this is important: systematic prompt testing reduces hallucination risk, ensures structured outputs meet downstream schema requirements, and helps standardize prompt behavior across environments.
Operational caveats:
Implications for IT and procurement:
For IT teams and makers, the immediate recommendation is to pilot with high‑value, low‑blast‑radius scenarios (customer order status, appointment scheduling, internal reporting) while exercising the new governance controls: test prompt evaluations, validate runtime monitors, and treat model selection and hosted execution as policies, not convenience toggles. When those disciplines are in place, Copilot Studio’s September additions meaningfully accelerate automation — but only if treated as a platform that requires the same operational rigor as any other critical enterprise system.
Source: Microsoft What's new in Copilot Studio: September 2025 | Microsoft Copilot Blog
Background
Copilot Studio sits inside the Power Platform as Microsoft’s low‑code environment for building, tuning, and deploying AI agents that operate on tenant data, connectors, and business systems. Over the past year Microsoft has layered governance (Entra Agent ID, Purview, DLP), retrieval grounding, and runtime telemetry into the product to support enterprise rollout. The September wave expands that platform with features targeting three practical gaps: (1) interacting with UIs where APIs don’t exist, (2) shipping agents to end‑user channels and native apps, and (3) giving makers the testing and operational tools needed to scale safely.What shipped in September — headline changes
- Computer use (public preview): agents can now operate apps and websites with a virtual mouse and keyboard — clicking, typing, and navigating UIs — enabling automation for tasks without APIs. A hosted Windows 365 browser and local registered device support are included, plus templates, credential management, and allow‑list controls.
- WhatsApp channel (claimed GA in Microsoft’s post): makers can deploy agents natively to WhatsApp for customer interactions that support attachments, phone‑number authentication, and enterprise compliance. Note: this specific GA claim is present in the product post but lacked corroboration in the other documents we reviewed; validate availability against tenant admin controls and Microsoft’s official channel documentation before planning production rollouts. Flagged as needing tenant verification.
- Prompt builder improvements (preview): prompt evaluations to test prompts at scale (bulk upload, auto‑generate cases, real telemetry imports) and built‑in Power Fx formula support for dynamic prompt inputs.
- File groups (GA): organize up to 12,000 locally uploaded files into groups (treated as single knowledge sources) and attach variable‑based instructions for better retrieval guidance. Ungrouping isn’t supported yet — deletion required to change a group.
- Component collections & solution export/import (GA): package topics, knowledge, actions, and entities into reusable collections and move them across environments via the Copilot Studio Solution Explorer. This simplifies lifecycle management and reuse.
- End‑user file uploads in conversations (GA): agents can accept files from users and pass content plus metadata into flows, Power Automate, or connectors for downstream processing (summarization, extraction, validation).
- Code interpreter (GA): Python code generation and execution inside agents, with prompt‑level or agent‑level enablement and CRUD operations on Dataverse tables; agents can dynamically generate visualizations and reusable logic.
- Agents Client SDK (text + adaptive cards GA): embed agents into Android, iOS and Windows apps for in‑app multimodal conversations; broader modality support (voice, image, video) is planned.
- MCP connectors in Studio (public preview): one‑click connecting of Model Context Protocol servers to Copilot Studio, with resource (files/images) support to broaden agent inputs.
- Analytics and admin improvements: dedicated Microsoft 365 Copilot Chat environments for Copilot Studio lite agents (clarifies data geography and optional billing/usage reporting), themes and insights for unanswered generative questions (preview), monthly consumption limits and active‑user metrics (GA), and ROI tracking for autonomous runs (GA).
- Copilot Studio Agent Academy: a free, self‑paced curriculum (Recruit level live) to help makers learn agent design, with higher‑level modules planned.
Build richer agent experiences
Computer use — UI automation for the human web
Computer use brings what many automation teams have asked for: the ability for agents to interact with UIs that lack APIs or connectors. Instead of relying solely on connectors and MCP endpoints, an agent can now:- Click buttons, select menus, and type text with a virtual mouse and keyboard.
- Use built‑in vision and reasoning to adapt when interfaces change.
- Run inside a hosted Windows 365 browser (hosted automation) or target registered local devices (for access to local apps).
- A hosted Windows 365 browser reduces setup complexity for web automations.
- Credential vaulting lets agents log into sites and apps securely during runs.
- Allow‑list controls restrict agent interactions to approved domains and apps.
Caveats and operational notes:
- UI automation increases attack surface and operational brittleness; treat these automations like RPA assets — test aggressively and monitor.
- Credential handling, allow‑lists and hosted vs. local execution choices must be part of a security plan before production rollout.
Channels and embedding: meet users where they are
Microsoft positions Copilot Studio as the enterprise‑grade agent platform that can deploy to many channels. Two items matter here:- WhatsApp channel: Microsoft’s post states the WhatsApp channel is generally available, enabling phone‑based authentication, images/attachments, and enterprise governance parity. Because WhatsApp reaches more than two billion users, native deployment opens broad customer‑facing scenarios (support, order tracking, scheduling). This claim appears in Microsoft’s update; however, the documents we reviewed did not include an independent verification of GA status. Confirm tenant availability and provisioning steps in your admin center before committing to a production project.
- Agents Client SDK: the SDK lets developers embed agents directly inside Android, iOS, and Windows apps, with text + adaptive card conversations generally available now and additional modalities coming. This enables in‑app intelligence without switching contexts and opens opportunities for proactive, context‑rich assistance.
- Validate channel compliance and data residency for any customer‑facing messaging channel (WhatsApp has its own contractual and compliance rules).
- Use adaptive cards and the SDK to standardize in‑app interaction patterns and preserve audit trails.
- Pilot with a narrow, high‑value flow (order tracking, appointment reminders) before scaling.
Authoring, testing, and knowledge management
Prompt evaluations and Power Fx
Prompt quality is the single largest source of unpredictability for agent behavior. The preview of prompt evaluations adds a systematic testing layer:- Build test sets by bulk upload, auto‑generation, real telemetry, or manual cases.
- Customize evaluation metrics (tone, clarity, keyword matches, structured output compliance).
- Receive accuracy scores and per‑case insights to iterate quickly.
Why this is important: systematic prompt testing reduces hallucination risk, ensures structured outputs meet downstream schema requirements, and helps standardize prompt behavior across environments.
File groups and knowledge scale
File groups (GA) let makers treat collections of locally uploaded files as single knowledge sources, reducing clutter and improving retrieval relevance. Notable limits and behaviors:- Up to 25 file groups per agent, covering up to 12,000 files.
- Add variable‑based instructions to guide retrieval relevance.
- Grouping is one‑way for now: to change a group you must delete it (ungroup not supported yet).
Component collections and lifecycle
Creating component collections and exporting/importing agents via the Solution Explorer addresses a recurring pain point: moving agents, topics, knowledge, and actions across dev/stage/prod environments. Reusable components speed agent assembly and make governance more tractable by reducing ad‑hoc copies.Code execution and data operations
Code interpreter (Python) — GA
Code interpreter is now generally available in both Copilot Studio and Copilot Studio lite (Microsoft 365 agent builder). It enables:- Natural‑language generation of Python actions and editing of generated code in the authoring flow.
- Runtime execution of that Python code inside an agent (visualizations, data transforms).
- CRUD operations on Dataverse tables from prompts (create/read/update/delete).
- Agent‑level enablement: all prompts and actions in an agent can execute Python (suitable for consistent logic across conversations).
- Prompt‑level enablement: enable interpreter per prompt for experimentation or lightweight use.
- Complex data processing, tabular transforms, custom visualizations, and structured output generation become first‑class capabilities inside agents.
- Security and sandboxing become critical: review execution environments, data access rules, and audit trails for code runs.
Connectors, MCP, and integration
MCP connectors in Studio (public preview)
Copilot Studio now allows one‑click connection of MCP servers: provide an MCP host URL and Copilot Studio will connect and make MCP resources (files, images) available to agents. This reduces integration friction and helps agents call broader partner/line‑of‑business toolsets.File uploads from end users
Agents can now accept files directly from end users and pass them, with metadata, to Power Automate, connectors, or downstream flows. This closes an important loop for document‑centric scenarios (claims intake, application processing) and reduces the need for external portals.Managing and measuring agents at scale
Microsoft added several analytics and admin controls intended to make Copilot Studio manageable in production:- Dedicated environment for Copilot Studio lite: agents run in a Microsoft 365 Copilot Chat environment that maps data geography and optionally surfaces billing/consumption info in the Environments tab. This gives admins clearer data residency visibility.
- Analytics: themes for generative AI questions (preview), insights on unanswered generative questions (preview), monthly Copilot credit limits shown beside month‑to‑date usage (GA), active user metrics (GA), and ROI analysis for agent runs (GA). These metrics help teams track adoption, surface gaps in knowledge coverage, and quantify business value.
- Agent monthly consumption limits: makers can now view credit limits and usage in Copilot Studio analytics, reducing the need to switch admin consoles.
Security, governance, and the multi‑model landscape
Two parallel themes in September’s updates deserve explicit attention: runtime protection and model choice.Runtime monitoring and enforcement
Microsoft added near‑real‑time runtime security controls that forward an agent’s planned actions to external monitors (Microsoft Defender, third‑party XDR, or custom endpoints) for an approve/block decision while the agent runs. The public preview flow sends the plan payload (prompt, chat history, tool inputs, metadata) and expects a short‑latency verdict; audit logs record all interactions. This inserts enforcement at the point of action rather than only at design time or via post‑hoc logs.Operational caveats:
- The platform’s preview semantics report a short decision window (commonly referenced at about one second) and a default‑allow fallback if no response arrives; confirm exact timeout behavior and failure modes in your tenant before enabling sensitive automations.
- Runtime payloads can contain sensitive context — define redaction rules and telemetry retention up front.
- Integrate verdicts with existing SIEM/SOAR playbooks for a consistent incident response.
Multi‑model Copilot: Anthropic’s Claude models
In late September Microsoft added Anthropic’s Claude Sonnet 4 and Claude Opus 4.1 as selectable model options inside Copilot Studio and the Researcher agent. The move formalizes Copilot as a multi‑model orchestration layer, letting organizations route workloads to the model best suited by capability, cost, and compliance. Anthropic models may run on third‑party clouds (Amazon Bedrock / AWS), so enabling them has cross‑cloud and contractual implications.Implications for IT and procurement:
- Legal/compliance teams must evaluate Anthropic’s hosting and data handling terms.
- Admins must gate model enablement via the Microsoft 365 admin center; Copilot will fall back to tenant defaults if a model option is disabled.
- Treat model selection as an operational discipline: pilot models against your production prompts and test suites, measure cost, latency, and output quality.
Risks, limitations, and practical recommendations
Microsoft’s September releases materially raise Copilot Studio’s usefulness, but they also bring measurable operational complexity. Key risks and mitigations:- Brittleness of UI automation: UI changes break flows. Mitigate with test suites, monitoring, and a process for rapid remediation. Use allow‑lists and credential vaults to reduce blast radius.
- Data exfiltration via runtime hooks: the runtime monitoring payloads can contain sensitive context. Define redaction rules, retention limits, and place monitors inside tenant VNets when required. Test failover modes to avoid silent default allows in critical flows.
- Third‑party model hosting (Anthropic): routing enterprise content to models hosted outside Microsoft‑managed infrastructure changes compliance posture. Coordinate legal, security, and procurement reviews before enabling Anthropic models.
- Execution of arbitrary code (Python interpreter): code execution increases capabilities but raises sandboxing and privilege concerns. Isolate execution environments, enforce least privilege on Dataverse access, and log code runs for forensics.
- Governance drift across environments: component collections and solutions make export easier, but ensure environment‑specific secrets, credentials, and allow‑lists are replaced with environment variables or secure configuration during deployment.
- Start with a scoped pilot (non‑production tenant) and representative prompts + test sets.
- Validate runtime monitor latency, verdicts, and failure modes under load.
- Lock down allow‑lists and credential management for computer use automations.
- Enable model options behind an admin gate and pilot with sampled traffic to measure cost and quality.
- Create prompt evaluation baselines and iterate until accuracy and structure metrics meet SLAs.
- Define audit, retention and redaction policies for plan payloads and code runs.
- Build a blameless incident playbook that ties runtime monitoring, SIEM alerts, and automated rollback for problematic agent runs.
Conclusion
September’s Copilot Studio updates push the product beyond experimentation into genuine operational territory. Computer use unlocks UI automation where APIs don’t exist; code interpreter brings robust data and visualization capabilities into prompts; prompt evaluations and lifecycle tooling make it realistic to treat agents like software artifacts; and runtime monitoring plus analytics give defenders and makers the tools to operate agents responsibly at scale. These changes close many of the pragmatic gaps that previously slowed enterprise adoption — but they also raise the bar for governance, security, and disciplined rollout plans.For IT teams and makers, the immediate recommendation is to pilot with high‑value, low‑blast‑radius scenarios (customer order status, appointment scheduling, internal reporting) while exercising the new governance controls: test prompt evaluations, validate runtime monitors, and treat model selection and hosted execution as policies, not convenience toggles. When those disciplines are in place, Copilot Studio’s September additions meaningfully accelerate automation — but only if treated as a platform that requires the same operational rigor as any other critical enterprise system.
Source: Microsoft What's new in Copilot Studio: September 2025 | Microsoft Copilot Blog
