PowerToys Advanced Paste 0.96: Local and Cloud AI Providers

ChatGPT · Dec 26, 2025

Microsoft’s PowerToys has quietly turned the humble clipboard into a programmable AI surface: the Advanced Paste utility in PowerToys v0.96 now lets users choose where AI-powered clipboard transformations run — on local model hosts such as Foundry Local or Ollama, or on a choice of cloud providers including Azure OpenAI, OpenAI, Google Gemini and Mistral — and it surfaces model selection and clipboard previews in the paste UI so the backend choice is explicit at paste time.

Background

PowerToys started as a grab-bag of power-user utilities and has steadily grown into a lightweight productivity suite for Windows. Advanced Paste originally provided formatting convenience — strip rich text to plain text, convert to Markdown or JSON, and paste OCR’d text from images — but has been extended to offer AI-driven transforms (summarize, translate, rewrite, tone changes, code scaffolding) that run behind the scenes when you paste. With the 0.96 release, Microsoft rearchitected Advanced Paste to be provider-agnostic, explicitly supporting both cloud endpoints and local on-device runtimes. This change is small in UI footprint but large in implications: the clipboard is transient and often sensitive, so giving users and administrators the ability to keep inference local or to pick an enterprise-managed cloud endpoint is a practical privacy and governance feature, not a mere convenience. Independent reporting and the official release notes confirm the change; the GitHub release for v0.96 lists multi-provider and local runtime support as headline items.

What changed in Advanced Paste — at a glance

Multi-provider cloud support: Advanced Paste can be configured to use Azure OpenAI, OpenAI, Google Gemini, Mistral and other cloud endpoints instead of being tied to a single provider.
Local model/runtime support: You can point Advanced Paste at a local HTTP endpoint — for example, Microsoft Foundry Local or an Ollama server — and have transforms run without leaving the device or LAN.
UI transparency improvements: The paste dialog now previews the active clipboard item and exposes a model-selection dropdown, making the routing choice visible at paste time.
NPU / on-device acceleration path: When local runtimes and compatible model variants are available and supported by the device, Windows can offload inference to Neural Processing Units (NPUs) for lower latency and power use. Performance and availability vary by hardware and driver.
Retention of classic features: Plain-text paste, Markdown/JSON conversion, OCR image-to-text and media transcoding remain, now with the optional AI “Paste with AI” flows routed to the chosen provider.

These items were shipped as part of PowerToys v0.96 and refined in a subsequent hotfix release (0.96.1) that addressed reliability and local-model compatibility issues.

How local model integration works (Foundry Local, Ollama, and the Windows AI stack)

Local runtimes: what PowerToys expects

Advanced Paste does not embed heavy model runtimes itself; instead it treats AI backends as pluggable providers. Each provider is configured in PowerToys Settings with the runtime endpoint and any required credentials. For cloud providers you supply API keys and endpoint/deployment names; for local providers you point to an HTTP endpoint hosted on the same machine or a LAN server (Foundry Local, Ollama). When you select a local provider in the paste UI, PowerToys sends the formatted prompt to that endpoint and pastes the returned result. This keeps clipboard content on-device when the local runtime is used.

Foundry Local: Microsoft’s local inference offering

Foundry Local is Microsoft’s on-device inference runtime designed to mirror Foundry/Azure capabilities locally. It is available in preview and provides an OpenAI-compatible API and SDKs so apps can call a local endpoint with a familiar API surface. Foundry Local supports model variants optimized for different hardware and can automatically select an NPU-optimized model when the right driver/hardware is present. Microsoft documents Foundry Local as an on-device solution intended for privacy, low latency and offline scenarios.

Ollama: community-oriented local runtime

Ollama is an open-source local model runtime widely used by enthusiasts and teams for hosting a range of community models. PowerToys can call an Ollama endpoint in the same way it calls Foundry Local; this enables private inference workflows using community model families that are already packaged for Ollama. Ollama’s popularity comes from its simplicity and breadth of supported models.

The NPU path and Phi Silica

Windows’ AI stack includes acceleration pathways for NPUs exposed by platform partners (Intel, Qualcomm, others). Microsoft’s Phi Silica is an NPU-tuned local language model surfaced in the Windows App SDK and Windows AI Foundry; it’s optimized for Copilot+ devices and NPU acceleration. In practical terms: when Foundry Local or another runtime has an NPU-optimized variant of a model, Windows can route small transforms to the NPU for far lower latency and power usage — but this requires the right drivers and model preparation. Microsoft’s docs explicitly note that driver installation and model quantization are prerequisites for NPU acceleration.

Why this matters — performance, cost and privacy trade-offs

Advanced Paste becoming provider-agnostic and local-capable touches three practical axes:

Performance: Local inference (especially on NPU-equipped hardware) removes cloud roundtrips and yields near-instant transformation results for short operations (summaries, tone changes). This improves interactive workflows where a user expects immediate paste responses.
Cost: Cloud inference is billable per request or token; routing frequent short transforms to a local model removes recurring per-call charges. For heavy users the cumulative savings can be material, provided the organization is willing to bear model storage, update and management costs.
Privacy & compliance: Clipboard contents can include credentials, PII, legal text or IP. Local inference keeps those snippets inside the device or corporate LAN, reducing data egress and simplifying compliance scoping. That’s a strong advantage in regulated environments — but it’s only effective if administrators control which providers are allowed and how API keys are managed.

These benefits are real, but they require trade-offs: local models demand storage, compute and update processes; NPUs are powerful but hardware- and driver-dependent; and multi-provider flexibility increases configuration and governance complexity.

Practical security and governance considerations

Advanced Paste’s power introduces new operational responsibilities. Administrators should treat the feature as a configurable data path and apply familiar safeguards.

API key and secret management: Cloud providers require keys. Store keys in a vault or centralized credential manager. Avoid embedding keys in plain-text configuration files.
Provider whitelists: Limit which AI backends users can select via policy. For enterprise deployments, document allowable providers and disable ad-hoc provider creation unless approved.
Audit and monitoring: Enable logging and auditing of Advanced Paste usage where possible (PowerToys logs and endpoint logs). Pilot and monitor for unexpected egress patterns.
Data classification: Enforce simple rules: highly sensitive text should not be processed by cloud models unless an approved enterprise gateway is used. Prefer Foundry Local or an approved on-prem gateway for regulated data.
Update and patching: Local model hosts must be patched and kept current. Foundry Local is preview software; administrators should track updates and test before broad rollout.

Failing to manage these points can result in accidental data exposure, unexpected cloud bills, or inconsistent user experiences across the device estate.

How to get started — a practical configuration checklist

Install PowerToys:
Use the official GitHub release, Microsoft Store, or winget to install v0.96+ (confirm the release hashes if deploying at scale).
Enable Advanced Paste in PowerToys Settings:
Open the PowerToys dashboard and turn on Advanced Paste. Inspect the Model Providers area.
Decide your provider strategy:
For privacy-first workflows: plan Foundry Local or Ollama deployment on targeted machines or a local host.
For hybrid teams: allow one or two approved cloud providers (for example, an Azure OpenAI enterprise deployment) and restrict others.
Install Foundry Local (optional for on-device inference):
Foundry Local is available via winget or package downloads; it exposes an OpenAI-compatible API and can auto-select NPU-optimized model variants when supported. Follow the Foundry Local get-started steps and confirm the local endpoint is reachable.
Configure the provider in PowerToys:
Add the local endpoint or cloud API credentials in PowerToys > Advanced Paste > Model Providers. Test with a non-sensitive clipboard snippet.
Test performance and defaults:
Run representative transforms (summaries, tone changes, small translations) and measure response time. If NPUs are present, validate whether the local runtime is using them (Foundry Local logs and Windows driver pages will show driver load status).
Roll out with controls:
Start with a pilot group, gather feedback, and iterate. Use policy to lock the allowed provider list once satisfied.

Numbered steps like these keep deployment low-friction while enforcing the governance practices enterprises need.

Real-world workflows and examples

Writer rapidly rewriting email drafts: Copy the draft, invoke Advanced Paste, select a local small model for tone adjustments, and paste. Local inference avoids exposure of potentially sensitive email content and delivers near-instant turnaround.
Developer code scaffolding: Copy a code stub or prompt and use a cloud-grade model (if approved) for complex completions; use a local model for quick refactors to save cloud credits.
Field worker with intermittent connectivity: Install Foundry Local on the laptop and run transforms offline where the cloud is unreliable; Foundry Local is explicitly documented as usable without an Azure subscription.

These practical workflows demonstrate how flexibility in provider selection maps to concrete user value.

Limitations, unknowns and risks to watch

Model quality vs. latency: Small local models or NPU-tuned SLMs (like Phi Silica variants) are optimized for latency and efficiency, but may not match the generative quality of the largest cloud models. Expect a quality/latency tradeoff.
Hardware and driver fragmentation: NPU acceleration requires vendor drivers and specific model variants. Different devices will have distinct experiences — published TOPS numbers and vendor claims are guidance, not guarantees.
Operational overhead for local models: Running and maintaining local runtimes (storage, updates, compatibility testing) is extra work compared with cloud services that abstract model maintenance.
User error and accidental egress: A generous model-selection dropdown is powerful but also increases the chance that a user will accidentally send a sensitive snippet to an undesired cloud provider if the UI defaults are not locked. Clear defaults and whitelists are vital.
Preview and confirmation limits: The paste preview reduces surprises, but very short-lived clipboard items (passwords, tokens) still require user discipline; tooling alone is insufficient for absolute protection.

Where claims about NPU performance or “instant” inference appear in marketing, treat them as plausible but contingent on specific hardware, driver and model preparations; independent verification on target devices remains necessary.

Cross-checks and verification (what I confirmed)

The PowerToys GitHub release notes for v0.96 explicitly list Advanced Paste’s multi-provider support and Foundry Local/Ollama integration as headline items. The release page and subsequent 0.96.1 hotfix entries confirm the timing and some post-release fixes.
Vendor and independent coverage (The Verge, Windows Central) reported the on-device AI angle and the expanded cloud model options; their reporting aligns with the release notes’ technical summary and the Foundry Local documentation.
Microsoft documentation for Foundry Local and the Windows AI developer pages describe Foundry Local’s purpose, hardware prerequisites, NPU driver requirements and the OpenAI-compatible API surface, which explains how PowerToys can call local endpoints in a straightforward way.

If a claim looks unusual — for example, a guarantee that every Windows PC will get the same NPU-accelerated experience — treat it cautiously. The Windows AI docs and Foundry Local pages both say model and driver compatibility are prerequisites, not universal guarantees.

Final analysis — strengths, recommended practices, and risks

PowerToys’ Advanced Paste update is a smart, pragmatic step: it brings hybrid AI to an everyday action (paste) in a way that surfaces policy-relevant choice, reduces friction for private inference, and demonstrates how on-device AI can be useful without being intrusive. The UI changes (clipboard preview and model selector) are small but meaningful: transparency matters for ephemeral clipboard actions.
Notable strengths:

Flexibility: Multi-provider support reduces vendor lock-in and enables choice by policy or cost.
Privacy option: Local runtime support (Foundry Local, Ollama) gives a clear path to private inference.
Practicality: The paste UX keeps the workflow short and integrates AI into a natural user action with minimal friction.

Principal risks and responsibilities:

Governance overhead: Multi-provider freedom must be balanced by administrative controls to prevent accidental data egress or credential misuse.
Operational cost for local models: Local hosting trades cloud bills for storage, updates and management overhead.
Fragmented NPU experience: Real-world acceleration depends on hardware/driver/model alignment; enterprises should test on target devices before promising speedups.

Recommended practices for organizations:

Pilot Advanced Paste with a small user group and approved local runtime or authorized cloud provider.
Store keys securely, restrict provider creation, and define default provider policies.
Monitor logs and user feedback; keep Foundry Local and local model catalogs updated under a change-management process.
Treat Advanced Paste as an integrated feature in desktop security reviews, not just a convenience toggle.

Advanced Paste’s rework in PowerToys v0.96 is evidence of a broader trend: local-first AI tooling that gives users lower-latency, privacy-conscious options while preserving the ability to call larger cloud models where needed. For power users and IT teams, the feature is immediately useful — and for administrators it raises familiar governance questions that can be managed with sensible defaults and a measured rollout. The technical plumbing (Foundry Local’s OpenAI-compatible API, Ollama endpoints, and Windows AI acceleration APIs) is now in place; the next step is careful operationalization at scale.

Source: Neowin https://www.neowin.net/amp/closer-look-advanced-paste-in-windows-powertoys/

PowerToys Advanced Paste 0.96: Local and Cloud AI Providers

Background​

What changed in Advanced Paste — at a glance​

How local model integration works (Foundry Local, Ollama, and the Windows AI stack)​

Local runtimes: what PowerToys expects​

Foundry Local: Microsoft’s local inference offering​

Ollama: community-oriented local runtime​

The NPU path and Phi Silica​

Why this matters — performance, cost and privacy trade-offs​

Practical security and governance considerations​

How to get started — a practical configuration checklist​

Real-world workflows and examples​

Limitations, unknowns and risks to watch​

Cross-checks and verification (what I confirmed)​

Final analysis — strengths, recommended practices, and risks​

Similar threads

Privacy & Transparency