PowerToys 0.96 Advanced Paste Brings Local AI Inference and Multi Provider Choice

ChatGPT · Nov 21, 2025

Microsoft's quietly released PowerToys update for Windows 11 has turned one of the suite’s most humble utilities — the clipboard — into a showpiece for the company’s on-device AI strategy, enabling local model inference, multiple cloud-provider options, and a redesigned user experience that prioritizes speed and privacy. The change arrives in PowerToys v0.96 and adds support for running models via Microsoft’s Foundry Local and the open-source Ollama runtime, while also broadening Advanced Paste’s cloud provider support to include Azure OpenAI, Google Gemini, Mistral, and more — giving users a true choice between local, private inference and managed cloud models.

Background

For years, PowerToys has been the lightweight toolkit for Windows power users, offering utilities that restore missing convenience features and save real time on day-to-day tasks. Advanced Paste started as a practical way to paste plain text or convert clipboard formats; today it’s a clipboard-level transformation pipeline that can apply translations, summaries, grammar fixes, format conversions, and OCR — and now it can do many of those operations locally on the PC. The PowerToys 0.96 release explicitly documents the addition of multiple online and on-device AI model providers, and Microsoft’s own documentation outlines how Advanced Paste can be configured to use either cloud API keys or local models hosted in Foundry Local or Ollama. This upgrade arrives at the same moment Microsoft is pushing a broader strategy to make Windows an “AI PC” platform: Copilot+ devices with dedicated NPUs, a Windows AI Foundry for local model hosting, and APIs like Phi Silica for NPU‑tuned local language models. Those platform-level investments are the reason PowerToys can realistically offer on-device AI for clipboard tasks today.

What changed in Advanced Paste (high level)

Advanced Paste in PowerToys 0.96 is no longer a single-provider, cloud-first clipboard helper. The headline shifts are:

Multi-provider model support — users can configure and choose from Azure OpenAI, OpenAI, Google Gemini, Mistral, and local hosts like Foundry Local and Ollama.
On-device inference — local models can run on-device (including on an NPU when available) to perform translation, summarization, and other transforms without sending clipboard content to cloud APIs.
UX improvements — the Advanced Paste window now previews clipboard content and exposes a simplified model-selection drop-down to make the routing choice visible and easy.

These changes transform the clipboard from a passive buffer into a programmable content layer where a single copy → paste action can apply complex transformations in a fraction of a second.

Why the change matters

Performance: routing small transforms to a local model or NPU removes cloud roundtrips and reduces latency for interactive paste operations.
Cost: local inference avoids per-request cloud API charges. For heavy clipboard transformation use, this matter materially.
Privacy: clipboard contents often include sensitive text (emails, tokens, meeting notes). Local inference reduces the surface area for data exfiltration.
Flexibility: multi-provider support reduces vendor lock-in and lets users or IT choose providers per policy, cost, or feature needs.

How on-device AI is implemented in PowerToys

Foundry Local and Ollama: local model hosts

PowerToys 0.96 documents support for local model runners: Microsoft’s Foundry Local (part of the Windows AI Foundry ecosystem) and Ollama, an open-source local runtime that can host multiple model families. Users can point Advanced Paste at a local host endpoint instead of configuring a cloud API key, enabling entirely offline or private-local workflows. The Microsoft Learn Advanced Paste page explicitly notes that the AI-powered feature may be powered by an API key for cloud services or by a local model configured in Foundry Local or Ollama.

NPUs and Phi Silica: where Windows AI ties in

Microsoft’s consumer and developer guidance shows that Copilot+ PCs use NPUs rated at 40+ TOPS (trillions of operations per second) to deliver the fastest local experiences. The Windows App SDK exposes Phi Silica, a local NPU-optimized small language model (SLM) that developers can target through Windows AI Foundry APIs; Phi Silica is tuned for latency and efficiency on Copilot+ NPUs. When a local model is available and the system has NPU resources, Windows can accelerate model inference onto the NPU for dramatic speed and efficiency gains compared with CPU-only runs. This is the same platform-level capability that Advanced Paste now leverages when local inference is selected. Important caveat: the presence and performance of NPUs vary by device, vendor, and driver. Microsoft’s Copilot+ program defines hardware thresholds and certification gates; not every Windows 11 device can deliver the same on-device AI experience. Treat published TOPS numbers as a device-class benchmark rather than a guaranteed user experience across the entire Windows install base.

The redesigned user experience

PowerToys has not only added model options — it made them visible and actionable.

The Advanced Paste window now displays the current clipboard item (text or image preview) so users can quickly verify what will be transformed before pasting.
Model selection is surfaced directly in the paste UI via a compact dropdown, making it simple to switch between a local Foundry/Ollama runtime and cloud providers like Azure OpenAI or Gemini.
Quick actions remain: paste as plain text, paste as JSON/Markdown, OCR-image-to-text paste, and the new Paste with AI flows for translation, summarization, tone change, and code scaffolding.

This UX shift is important because clipboard actions are inherently transient: users copy content with the intent to paste immediately. By bringing model selection and a content preview into the same transient UI, Advanced Paste reduces cognitive friction and gives users control without forcing them into the PowerToys settings panel.

Practical workflows: examples that change everyday tasks

Copy a paragraph of email text in one language, press the Advanced Paste hotkey, select the local translation model on Foundry Local, and paste the translated text into a reply — no cloud calls, near-instant replies.
Copy a meeting transcript snippet and use Advanced Paste to produce a concise action-item summary before pasting into your task manager. Selecting a local or cloud model is a one-click choice.
Copy a screenshot containing a table, open Advanced Paste, choose OCR → JSON output, and paste structured data into a spreadsheet or editor. The OCR step runs locally by default and can be combined with an on-device model for clean-up and formatting.

These are small but high-frequency improvements; when done right, they shave minutes off workflows and reduce the temptation to export sensitive snippets to third‑party tools.

Cost, performance and governance implications

Cost: organizations that pay for cloud model usage (per-token billing) will see concrete savings if common clipboard transforms are routed to local models. For users who perform many small transformations, on-device inference can be far cheaper than repeated cloud calls. Multiple outlets report that Advanced Paste now supports local runtimes specifically to avoid per-use API credit costs.
Performance: local models running on an NPU can dramatically reduce latency and battery impact for repetitive, small-scale transforms (summaries, translations), compared with networked cloud inference. However, the user experience depends on available NPU compute and model size; a powerful on-device NPU will be faster, while CPU-only local inference will still be slower than server-grade models in many cases.
Governance and auditability: multi-provider support helps IT teams enforce vendor policies. Enterprises can prefer Azure OpenAI for centrally managed, auditable cloud inference, or opt for Foundry Local/Ollama when data residency and offline use are priorities. The configuration model in PowerToys — which accepts API keys for cloud providers and endpoints for local hosts — makes it possible to bake choice into deployment scripts and Group Policy.

Security and privacy analysis

Advanced Paste’s move toward local inference strengthens privacy posture by default, but it also introduces new operational considerations.

Positive: Local inference reduces the need to send clipboard content to third-party clouds, lowering leakage risk for short-lived sensitive snippets (passwords, personal data). Microsoft’s documentation emphasizes that non-AI features (plain text paste, OCR) run locally by default, while AI-powered transforms require explicit configuration.
New risks: users may unknowingly configure a cloud provider as the default and paste sensitive content through it. The visible model dropdown mitigates this risk, but organizational policy and education remain essential. Administrators should audit model provider settings on managed machines and establish default provider controls where required.
Supply-chain concerns: using locally hosted models requires vetting — especially when leveraging third-party models via Ollama or community downloads. Organizations should validate model provenance, license terms, and whether models contain embedded copyrighted or otherwise problematic training artifacts. Assume local models are as trustworthy as the sources that supply them; verify before broad deployment.
Attack surface: adding local model runners and network listeners (for local host endpoints) can widen the attack surface if not configured securely. Standard hardening — limiting network exposure, applying least-privilege, and monitoring model host processes — is critical.

Enterprise deployment considerations

Inventory: identify which machines qualify as Copilot+ (NPU-equipped) and which will rely on CPU-only local inference or cloud providers. Microsoft’s Copilot+ device lists and NPU guidance provide vendor-specific device catalogs and a 40+ TOPS target for device classification.
Policy: define safe defaults — e.g., disallow public cloud providers for clipboard AI on machines that handle regulated data, or require per-user consent and logging. PowerToys’ configuration can be scripted, but IT must integrate it with device management.
Model governance: if Foundry Local or Ollama is used, decide which models are sanctioned and where they are stored. Implement checksum verification and periodic revalidation.
Pilot: run a small pilot measuring latency, CPU/NPU utilization, and user productivity gains before a broad rollout. Collect telemetry on how often users pick local vs cloud models to inform licensing and cost planning.

Limitations, unknowns, and cautionary notes

Not all features scale: heavy reasoning tasks and large-context generation still benefit from cloud models. Advanced Paste’s strengths are short, targeted transformations — translations, short summaries, and format conversions. Pasting a 50-page document for deep summarization is still a cloud-scale job in most practical setups.
Availability and gating: Microsoft stages many AI experiences by hardware, account type, and region. Some on-device capabilities are gated to Copilot+ hardware or specific markets, so the experience will not be identical across the installed base. Treat availability as staged rather than universal.
Unverified or promotional claims: some broad assertions — for example, that “most organizations (78%) have implemented AI” — require context and a current, peer-reviewed source to substantiate. If you plan decisions around similar statistics, verify the claim against up‑to‑date market research. Until then, treat such percentages as indicative rather than authoritative.
Model parity and fidelity: cheaper, smaller local models may produce different quality outputs compared with large cloud-hosted LLMs. Expect occasional differences in tone, translation quality, or code generation accuracy between local and cloud providers. That variance is a function of model family and tuning, not PowerToys itself.

Industry impact and what comes next

PowerToys’ on-device AI in Advanced Paste is a meaningful proof point for a larger industry trend: decentralized/local-first AI for privacy-sensitive, latency-critical tasks. Major platform vendors are pursuing the same path: provide large cloud models for heavy tasks and enable small, efficient local models for high-frequency, low-latency interactions. PowerToys’ implementation is notable because it puts model choice and local routing in the hands of end users, which may accelerate similar features in commercial productivity apps.
Expect the following ripple effects:

Third-party productivity tools will feel pressure to add local-model support or flexible provider selection to match user expectations.
Enterprises will demand governance features (policyable provider defaults, enterprise model catalogs, telemetry). PowerToys’ modular approach gives a template for how to expose these controls.

Recommendations for power users and IT pros

Power users: try Advanced Paste with a local Ollama or Foundry Local instance on a test machine to gauge latency and output quality for your common transforms. Compare outputs with Azure OpenAI or Gemini for fidelity. Use the visible model dropdown to avoid accidental cloud sends.
IT teams: pilot Advanced Paste in a controlled group, decide sanctioned model providers and local model lists, and script PowerToys settings deployment. Audit logs and user consent flows are essential when clipboard AI is enabled on endpoints handling regulated data.
Developers: examine Windows AI Foundry and Phi Silica APIs if you plan integrations that require the lowest latency or NPU acceleration. Phi Silica is available via Windows AI APIs and is specifically tuned for Copilot+ NPUs.

Final assessment

PowerToys 0.96’s Advanced Paste is a small technical change with outsized implications. By enabling local model inference through Foundry Local and Ollama, and by supporting multiple cloud providers, Microsoft has made the clipboard a pragmatic testbed for the company’s broader on-device AI ambitions. The update demonstrates that useful AI does not always require cloud roundtrips — many high-frequency clipboard transformations are ideal candidates for local, private inference.
Strengths:

Practical privacy gains: local inference keeps sensitive snippets on-device.
Reduced cost exposure: frequent, low-latency operations can avoid cloud API charges.
User control: visible model selection and content preview improve transparency and reduce accidental cloud usage.

Risks and friction:

Model governance and supply-chain concerns when using third-party local models.
Heterogeneous hardware: not all devices can run NPU-accelerated local models, so IT must plan mixed-fleet strategies.
Variable output parity: local models of different sizes and families will produce different results; users must accept potential trade-offs in fidelity for privacy and speed.

Advanced Paste’s update is more than a convenience improvement — it’s a living example of how local AI can be folded into everyday productivity without major overhead. For users and organizations that value privacy, speed, and lower cloud spend, this is a pragmatic and welcome step forward. For others, it’s a reminder that the AI transition will be hybrid: cloud for scale, local for latency and privacy, and choice — exposed plainly in the UI — is the feature that ultimately matters.

Conclusion
PowerToys 0.96 proves that even small system utilities can be an effective proving ground for platform strategy: by adding local model support, multi-provider choice, and a clearer UI, Microsoft has made a powerful argument that on-device AI is not a fringe capability but a practical tool for everyday productivity. The technical plumbing — Foundry Local, Ollama, NPUs, and Phi Silica — is already in place for those who want it, but the rollout remains staggered and dependent on device capabilities and governance choices. The Advanced Paste update is both a productivity feature and a strategic signal: the clipboard may be humble, but its evolution now points directly at a broader future where useful AI is fast, private, and locally controlled.

Source: TECHi Microsoft Supercharges Windows 11 With On-Device AI for Faster, Private Copy-Pasting

PowerToys 0.96 Advanced Paste Brings Local AI Inference and Multi Provider Choice

Background​

What changed in Advanced Paste (high level)​

Why the change matters​

How on-device AI is implemented in PowerToys​

Foundry Local and Ollama: local model hosts​

NPUs and Phi Silica: where Windows AI ties in​

The redesigned user experience​

Practical workflows: examples that change everyday tasks​

Cost, performance and governance implications​

Security and privacy analysis​

Enterprise deployment considerations​

Limitations, unknowns, and cautionary notes​

Industry impact and what comes next​

Recommendations for power users and IT pros​

Final assessment​

Similar threads

Privacy & Transparency