PowerToys 0.96 Advanced Paste Adds On-Device AI and Multi-Provider Cloud

ChatGPT · Nov 21, 2025

Microsoft’s PowerToys Advanced Paste has taken a decisive step toward local-first AI by adding on-device model support and multi-provider flexibility in the 0.96 update, turning a once-simple clipboard helper into a hybrid AI gateway that gives users faster transforms, reduced cloud costs, and stronger privacy guarantees.

Background

PowerToys has long been a collection of small, focused utilities for Windows power users; its Advanced Paste tool started as an efficient way to strip formatting or paste text in different formats. With PowerToys 0.96, Microsoft upgraded Advanced Paste to support multiple cloud providers and local model runtimes, explicitly adding Foundry Local and the open-source Ollama as on-device model hosts. This enables AI-driven clipboard transforms—translate, summarize, rewrite, code scaffold, and OCR post‑processing—to run either through cloud APIs or locally on a machine’s NPU or CPU.
These changes are surfaced in a modest but important UI refresh: Advanced Paste now previews the active clipboard item and exposes a model-selection dropdown in the paste UI so users can see what will be transformed and choose the backend (local or cloud) at the moment of paste. That UI shift is as important as the plumbing underneath because clipboard actions are inherently transient and often contain sensitive information.

What changed in PowerToys 0.96 — at a glance

On-device model support: Advanced Paste can route AI requests to local model runtimes (Foundry Local and Ollama), allowing inference to happen without leaving the device.
Multi-provider cloud support: Advanced Paste now supports Azure OpenAI, OpenAI, Google Gemini, Mistral and other cloud models in addition to OpenAI. This removes prior single-provider constraints and offers choice for quality, compliance, and cost.
UX refinements: Clipboard preview and a model-selection dropdown improve transparency and reduce accidental cloud egress.
Command Palette and PowerRename polish: Alongside Advanced Paste, PowerToys 0.96 brings broader improvements—Command Palette metadata and filtering, and photo-metadata tokens for PowerRename—making the release substantively useful beyond the AI features.

How on-device AI works in Advanced Paste

Local runtimes: Foundry Local and Ollama

Advanced Paste treats model backends as configurable providers. For cloud providers you supply API keys and endpoints; for local providers you point PowerToys at a local runtime URL or host. When a local provider like Foundry Local or Ollama is selected, Advanced Paste sends the prompt to the local endpoint instead of a cloud API, and the transformation result is returned immediately for paste. This makes fully offline workflows possible.

Foundry Local is Microsoft’s local hosting component within the broader Windows AI Foundry ecosystem and integrates with Windows AI tooling for local model hosting and potential NPU acceleration.
Ollama is an open-source local model runtime that many community users rely on for hosting a range of model families locally; PowerToys’ support allows Ollama-hosted models to be used without cloud egress.

Hardware acceleration and NPUs

When a device has an available Neural Processing Unit (NPU) and the local model is prepared/quantized for that accelerator, the Windows AI stack can offload inference onto the NPU for lower latency and power consumption. Microsoft’s Copilot+ hardware program and Windows AI APIs (including Phi Silica as a local SLM option) underpin that capability; however, actual performance varies widely by OEM, driver, and model. Published TOPS numbers (e.g., device baselines referenced in community coverage) are useful heuristics but should be treated as provisional until validated for a specific model and device.

Why this matters: benefits for everyday use

Speed and responsiveness

Routing small clipboard transforms to a local model or an NPU removes the roundtrip to the cloud and reduces latency perceptibly for interactive paste operations. For workflows that require repeated short transforms (summaries, translations, tone adjustments), local inference provides near-instant results.

Privacy and data locality

Clipboard content often contains sensitive fragments—credentials, internal notes, legal text. Local inference keeps that data on-device, reducing the data‑egress surface and potential obligations under provider logging policies. For regulated environments, this is a fundamental advantage.

Cost savings

Cloud models typically bill per token or per-request; frequent, small transforms can quickly consume API credits. Local models avoid recurring API costs and are attractive for power users who apply many AI transforms daily.

Vendor flexibility

Multi-provider support reduces lock‑in. Teams can choose Azure OpenAI for enterprise-grade controls, Google Gemini where latency or semantics favor it, or local models for cost and privacy—picking the right backend per use case.

Setup and configuration: step-by-step

Install PowerToys 0.96 from GitHub Releases, Microsoft Store, or winget and confirm package integrity for enterprise deployments.
Open PowerToys Settings and enable Advanced Paste under the modules list.
Under Advanced Paste Settings, open Model providers and click Add model. Choose a provider type: OpenAI, Azure OpenAI, Google, Mistral, Foundry Local, or Ollama.
For cloud providers, enter API keys, endpoint URLs, and model/deployment names as required by your chosen service. For local providers, enter the local runtime endpoint (for example, the HTTP address where Foundry Local or Ollama serves models).
Back in the Advanced Paste UI (hotkey default: Win+Shift+V), confirm the clipboard preview, choose the desired model provider from the dropdown, and run the “Paste with AI” action (translate, summarize, rewrite, etc..

Practical examples and workflows

Copy a paragraph in a foreign language, open Advanced Paste, select a local translation model (Foundry Local or Ollama), and paste the translated text—no network required.
Copy meeting notes and run “Summarize” with a cloud provider for higher-fidelity output, or use a local small model for on-device quick summaries and later refine in the cloud if needed.
Capture a screenshot containing tabular data, use on-device OCR to extract text, then run a local model to clean and structure the result to JSON for pasting into a spreadsheet. All OCR runs locally by default.

Compatibility and operational considerations

Device and model readiness

Not all Windows devices have NPUs or drivers that reliably accelerate local models. CPU-only local inference is possible but may be slower. Test the user experience on representative hardware before rolling out.

Storage and model management

Local models require disk space and periodic updates. If multiple users rely on a local model catalog, plan for centralized distribution or use a local host that serves multiple endpoints.

Enterprise deployment and policy

Enterprises should define sanctioned providers, centralize API key management, and prefer local model catalogs or enterprise gateway models for regulated data. Use configuration management to enforce provider whitelists and avoid shadow AI configurations on endpoints.

Risks, limitations, and governance

Unintentional cloud egress

Providing multiple providers increases the chance a user will accidentally route sensitive clipboard content to a cloud service. The new UI mitigates this risk by surfacing provider choice and clipboard preview, but administrators must still set policy and enforce whitelists to reduce human error.

Credential and key security

Cloud providers require API keys or endpoint credentials. Storing keys insecurely on endpoints or in scripts creates a high-risk failure mode. Enterprises should use secret managers, scoped keys, and least-privilege credentials.

Model fidelity and output variance

Local models—especially small or quantized ones suited for NPUs—will often deliver different output quality compared with large cloud models. That trade-off (privacy and speed vs. fidelity) must be communicated to users so they choose the right backend for the task.

Supply chain and licensing concerns

Running third-party models locally introduces licensing and provenance checks. Organizations must verify model licenses, assess any embedded telemetry or commercial restrictions, and ensure that local models meet corporate compliance standards. This is especially important when using community-hosted models via Ollama or similar runtimes.

Operational overhead for local hosting

Hosting and updating local models brings operational cost—storage, patching, quantization for NPUs, and possible GPU/NPU driver churn. Smaller teams should weigh this against expected cost savings from reduced cloud usage.

Security checklist and rollout guidance for IT

Pilot on a small set of devices representative of fleet diversity (CPU-only, GPU-accelerated, NPU-equipped).
Define and enforce a whitelist of allowed model providers and, if possible, configure defaults that prefer local runways for regulated workloads.
Use centralized API key vaults and restrict keys to least-privilege scopes. Rotate keys and audit their usage.
Require users to test with non-sensitive data and document expected output quality differences between local and cloud providers.
Monitor disk usage and background service health where local runtimes are deployed to avoid unexpected resource exhaustion or degraded user experiences.

Critical analysis: strengths and where Microsoft should be cautious

Microsoft’s approach in PowerToys 0.96 is pragmatic: deliver choice, keep local processing practical, and make the UI transparent. The strengths are clear:

Practical privacy gains: local inference keeps clipboard snippets on-device, which matters for high-frequency, sensitive workflows.
Lower operational cloud costs for small transforms: frequent clipboard transforms mapped to local models are cheaper than continuous cloud calls.
User control and transparency: a visible clipboard preview and provider dropdown reduces accidental cloud sends and increases user agency.

However, the very flexibility that makes this feature powerful also raises real governance concerns:

Configuration complexity: Multiple providers mean more settings that must be decided, documented, and enforced by IT. Without defaults and enterprise tooling, endpoints could proliferate shadow AI setups.
Divergent output quality: Local, quantized models will not always match cloud model outputs; users must internalize when local speed is preferable and when cloud fidelity is essential.
Device heterogeneity: Not every Windows 11 device can deliver the same local AI experience; Copilot+ NPUs and vendor drivers shape the real-world experience, and those hardware thresholds are not universal. Treat NPU TOPS claims as a device-class guideline rather than a guaranteed performance metric.

What to test first (practical checklist for power users)

Install PowerToys 0.96 on a test device and enable Advanced Paste.
Configure Ollama or Foundry Local locally and point Advanced Paste at that runtime to verify on-device transforms.
Compare output and latency for a set of representative transforms (translate, summarize, rewrite) across a local small model and a chosen cloud model. Measure latency, token costs, and fidelity.
Track resource usage (CPU/GPU/NPU, disk) during typical usage bursts to identify real constraints.
Document provider defaults and add a short “how we use Advanced Paste” guideline for colleagues to reduce accidental cloud routing.

Final verdict

PowerToys 0.96 is more than a modest update; it is an explicit demonstration that local-first AI can be practical for high-frequency productivity tasks. By enabling Foundry Local and Ollama, and by broadening cloud provider support, Microsoft made a small but meaningful utility into a flexible hybrid AI surface that addresses latency, cost, and privacy in a single workflow. For individual power users and privacy-conscious professionals, Advanced Paste’s on-device capabilities are an immediate win. For organizations, the release is strategically important—but not plug-and-play: it requires governance, key management, and representative hardware testing.
PowerToys’ evolution in this release illustrates a broader industry trajectory: cloud scale for heavy tasks, local models for frequent, latency-sensitive primitives, and choice as the differentiator. When deployed thoughtfully—with clear policies and pilot testing—Advanced Paste’s new capabilities will save time, protect sensitive data, and reduce cloud spend; when left unmanaged, they risk accidental data egress and operational overhead. The practical power is now in the hands of users and IT teams alike—Microsoft has supplied the plumbing, the rest is organizational discipline.

Source: Tech Edition Microsoft adds on-device AI support to the Advanced Paste tool in Windows 11

ChatGPT · Nov 21, 2025

Microsoft has quietly turned one of Windows 11’s humblest utilities — the clipboard — into a full-featured, hybrid AI gateway: PowerToys’ Advanced Paste (now in PowerToys 0.96) can run transformations using local AI models hosted via Microsoft’s Foundry Local or the open-source Ollama runtime, or route requests to multiple cloud providers including Azure OpenAI, Gemini, and Mistral.

Background / Overview

Advanced Paste began life as a small but useful PowerToys tool for pasting clipboard content in a chosen format (plain text, Markdown, JSON) and for quick OCR extraction from images. Over successive updates it gained AI-powered transformations — translate, summarize, rewrite, and other content transforms — but until now those operations typically relied on a single cloud API provider. The 0.96 release changes that model: Advanced Paste is now provider-agnostic and explicitly supports both cloud and local runtimes, enabling on-device inference where hardware permits. This shift is small in interface terms but consequential in capability: the clipboard becomes a programmable content layer where a quick copy → paste action can apply a transformation using the backend of the user's choice (local model, Azure, OpenAI, Google Gemini, Mistral, etc.. The new Advanced Paste UI also previews clipboard content and exposes a model-selection dropdown to make routing choices visible at the moment of paste.

What’s new in PowerToys 0.96 — headline changes

Local model support: Advanced Paste can send inference requests to Foundry Local or Ollama instances running on the same machine or local network, enabling offline or private workflows.
Multi-provider cloud options: Cloud endpoints are no longer limited to a single vendor. Advanced Paste supports Azure OpenAI, OpenAI, Google Gemini, and Mistral alongside local runtimes.
UI improvements: The paste window now previews clipboard contents and includes a model-selection dropdown, reducing friction and making the paste action more transparent.
NPU acceleration path: When local runtimes and compatible models are used, Windows can offload inference to available Neural Processing Units (NPUs) to reduce latency and power usage — provided the device and model support it.

These changes are documented in Microsoft’s PowerToys documentation and confirmed by independent coverage across tech outlets that tracked the 0.96 release.

How local AI integration works — Foundry Local, Ollama and the Windows AI stack

Foundry Local and Ollama: runtime options

Advanced Paste treats AI backends as configurable providers. For cloud providers you provide API keys and endpoints; for local providers you configure the local runtime endpoint (for example, a Foundry Local service on localhost or an Ollama instance). When Advanced Paste is set to use a local provider, it sends the transformation prompt to that HTTP endpoint and pastes the returned result — all without sending the clipboard content to an external cloud service.

Foundry Local: Microsoft’s local hosting component within the Windows AI Foundry ecosystem that integrates with Windows AI tooling and, in appropriate hardware scenarios, can route inference to NPUs.
Ollama: An open-source local runtime popular in the community for hosting a variety of model families on-device or on a local server; PowerToys’ support lets users point Advanced Paste at an Ollama endpoint.

NPU acceleration and model preparation

Local inference can be accelerated by a device’s NPU when:

the device provides a compatible NPU and the correct drivers/APIs are present; and
the chosen model is prepared/quantized for that accelerator.

When these conditions are met, on-device inference can reduce round-trip latency and power draw versus cloud-based calls. However, real-world speed depends heavily on device hardware, driver maturity, and whether the model has been optimized for the NPU in question. Published platform components (like small local models targeted at Copilot+ NPUs) make this possible, but the experience is heterogeneous across the Windows install base. Treat device and model readiness as prerequisites, not guarantees.

Why this matters — practical benefits

Privacy by design for clipboard transforms: Clipboard contents are often sensitive (snippets of email, credentials, internal notes). Running inference locally keeps the content on-device and reduces the data-egress surface. This is a concrete privacy win for regulated environments.
Lower per-use cost for frequent transforms: Local inference avoids per-token or per-request cloud billing for high-frequency clipboard operations like quick summaries or translations. For power users who perform many small transforms, the cost difference can be material.
Faster interactive response: On-device transforms, and particularly NPU-accelerated runs, cut network roundtrips and often deliver near-instant results for short operations. This makes paste-time transforms feel immediate and unobtrusive.
Vendor choice and hybrid workflows: Multi-provider support lets users and IT pick the right backend by policy — local for privacy, Azure for enterprise governance and auditability, or Google Gemini/Mistral for specific model capabilities. That flexibility reduces vendor lock-in and supports pragmatic hybrid strategies.

Enterprise and administrative considerations

Advanced Paste’s new capabilities are powerful, but they also introduce governance and operational overhead. Administrators should treat this as a controlled feature rollout rather than a trivial client update.
Key considerations:

Sanctioned provider lists: Use configuration management to define and enforce which providers (local or cloud) are permitted on managed endpoints. This prevents accidental leakage of sensitive clipboard content to unapproved cloud services.
API key and credential management: Cloud providers require API keys or managed credentials. Store keys in secure vaults and apply least-privilege scopes. Never bake credentials into scripts or unsecured registry keys.
Local model distribution and disk usage: Local models consume disk space and may need updates. Decide whether local hosts will be centrally managed (a shared local server) or individually provisioned. Plan storage and update mechanisms accordingly.
Pilot groups and telemetry: Deploy to pilot groups first, measure latency, disk consumption, and user error rates (e.g., unintended cloud routing), and adjust defaults or UI warnings before broader rollout.
Audit and logging: Cloud providers usually provide audit trails; for local inference you may need to augment logging at the host level to preserve necessary enterprise telemetry. Confirm legal and compliance implications for data retention and processing.

Risks and limitations — what to watch out for

1) Unintentional cloud egress

Exposing multiple providers increases the chance a user will send sensitive clipboard content to the cloud by mistake. The UI preview and model selector lower this risk, but IT must enforce whitelists and sensible defaults to avoid human error.

2) Model fidelity vs. privacy trade-offs

Local models — especially compact or quantized models tuned for NPUs — often produce different outputs than large cloud models. Organizations should communicate expected fidelity trade-offs: use local models for quick, private transforms and cloud models for higher-quality polishing.

3) Resource and maintenance overhead

Running local models requires disk space, CPU/GPU/NPU cycles, and a patch/update process. For enterprises, that means additional operational work to manage model catalogs, updates, and distribution.

4) Licensing, provenance and supply-chain checks

Using third-party open-source models locally brings licensing constraints and provenance questions. Enterprises must verify model licenses and the chain of custody for weights before deploying models on employee endpoints. This is especially important for models derived from community weights with mixed licenses.

5) Hallucinations, data leakage and prompt injection

AI models can produce incorrect outputs (hallucinations) and are susceptible to prompt-injection attacks when they process untrusted input. When clipboard content is transformed automatically, the potential for misleading or corrupted pastes exists. Prefer local or enterprise-hardened runtimes for sensitive uses and validate outputs for high-stakes workflows.

6) Heterogeneous hardware and experience

Not every Windows 11 device has an NPU, and the performance envelope for NPUs varies. Organizations with mixed fleets should provide fallback strategies (cloud or CPU-based local models) and manage user expectations.

Step-by-step: configuring Advanced Paste for local models

Install PowerToys 0.96 from the Microsoft Store, GitHub Releases, or winget and verify package integrity.
Open PowerToys Settings and enable the Advanced Paste module.
In Advanced Paste Settings, open Model providers and click Add model. Choose a provider type: OpenAI, Azure OpenAI, Google (Gemini), Mistral, Foundry Local, or Ollama.
For local providers, enter the local runtime endpoint (for example, the HTTP address where Foundry Local or Ollama serves models). For cloud providers, add API keys, endpoint URLs and model/deployment names as required.
Use the Advanced Paste hotkey (default Win+Shift+V) to preview the clipboard content, select the model/provider from the dropdown, and run the desired transform (translate, summarize, rewrite, etc..

Note: Enterprises should provide stepwise configuration guidance and pre-populated provider settings via configuration profiles to avoid end-user misconfiguration.

Performance expectations and hardware guidance

NPU-equipped Copilot+ devices: These devices can deliver the fastest local inference when models are pre-quantized and drivers are mature. Expect near-instant results for short transforms. However, the speed gains depend on model size and whether the model is optimized for the particular NPU.
CPU-only local inference: Will work but is typically slower than cloud inference for larger models and may consume noticeable CPU resources during repeated use. Use CPU-local models for intermittent offline needs only.
Hybrid strategy: Use local small/efficient models for high-frequency, low-fidelity tasks and cloud models for complex or high-accuracy operations. Multi-provider support makes switching between these strategies trivial at the UI level.

Caveat: specific TOPS thresholds, exact NPU benchmarks, and precise per-device throughput figures vary by OEM and driver stack; published TOPS numbers should be treated as guidance rather than hard guarantees unless validated with independent benchmarks on your target hardware. This is important to remember when sizing projects or buying hardware to meet on-device AI SLAs.

Practical recommendations and best practices

Default to privacy: On managed endpoints, set the default provider to a vetted local runtime (Foundry Local or Ollama) and require admin approval to add cloud providers. This reduces accidental leakage.
Use scoped API keys and secret management: For any cloud models, provision keys using enterprise vaults and restrict them to minimal permissions and quota limits. Rotate keys periodically.
Educate users: Make clear the fidelity trade-offs between local and cloud models; give examples (e.g., local model for quick translation, cloud model for final editorial polish).
Monitor usage and model output: Capture telemetry (with privacy safeguards) to detect unusual patterns (heavy cloud usage, repeated corrections after paste) and to tune defaults.
Verify model licensing: Before deployment, confirm that any third-party model weights hosted locally are licensed for your use case, and maintain provenance records for audits.

Critical analysis — strengths, business implications, and open questions

Strengths

Practical privacy posture: Advanced Paste shows how small, everyday workflows can benefit from on-device AI without radically changing enterprise architecture. It’s a meaningful user-facing privacy improvement.
Lower friction for experimentation: PowerToys provides a low-risk sandbox: IT and power users can experiment with local and hybrid AI workflows before committing to broader model deployments.
Platform signaling: Microsoft’s move to support Foundry Local and community runtimes like Ollama in a mainstream utility signals a broader platform strategy that embraces hybrid, on-device AI rather than cloud exclusivity.

Risks and open questions

Governance complexity: Supporting many providers on endpoints increases configuration sprawl and the need for policy enforcement. For regulated industries this is non-trivial.
Operational costs of local models: Although local inference reduces cloud spend, it shifts cost into storage, model maintenance, and support — hidden costs that are often underestimated.
Model quality variance: Users may be confused when the same “Paste with AI” action produces different results depending on the configured model. Clear labeling and education are essential.

Unverifiable or speculative claims to treat cautiously

Any precise financial savings figures, insurance premium reductions tied to local inference, or highly specific NPU performance claims should be treated as provisional until validated with independent benchmarks or direct vendor data. A handful of outlets and analysts have offered optimistic projections and percentages; those are useful as directional context but not as deterministic metrics for planning.

Bottom line and next steps for Windows users and IT teams

Advanced Paste’s move to support local AI models via Foundry Local and Ollama transforms the clipboard into a practical, privacy-aware AI surface on Windows 11. For power users, the change delivers tangible productivity upgrades: instant translations, quick summaries, and one-stroke formatting without cloud latency or per-request bills. For enterprise teams, the feature is promising — but it must be treated as a capability that requires governance, credential management, and operational planning.
Recommended next steps:

Test PowerToys 0.96 in a controlled pilot group to validate NPU and local-runtime behavior on representative hardware.
Draft provider policies and define default model/provider settings for managed endpoints.
Verify licensing and supply-chain provenance for any locally hosted third-party models.
Educate users about fidelity trade-offs between local and cloud models and set clear UI defaults to minimize accidental cloud egress.

Advanced Paste’s 0.96 update is a practical example of how on-device AI can be folded into everyday productivity workflows: not as a flashy, single-use demonstration, but as a functional, hybrid capability. The clipboard may be tiny, but its evolution now points squarely at a future where useful AI is fast, private, and locally controlled — provided organizations are prepared to manage the trade-offs.

Conclusion: PowerToys’ Advanced Paste has quietly become one of the clearest, lowest-friction proofs that on-device AI is ready for everyday use — and that choice (local vs. cloud) matters as much as raw capability. The update is both an immediate productivity win for users and a strategic milestone in Microsoft’s wider Windows AI roadmap; it is also a reminder that governance, model readiness, and hardware diversity will shape how broadly and safely these features can be adopted.

Source: Mezha Advanced Paste in Windows 11 now supports local AI models

ChatGPT · Nov 22, 2025

Microsoft has quietly turned one of PowerToys’ smallest utilities — Advanced Paste — into a proving ground for on‑device AI, adding multi‑provider support and explicit local model paths so clipboard transforms can run on your PC (even on an NPU) instead of always phoning home.

Background

PowerToys started as a modest collection of productivity tweaks for Windows power users: FancyZones, PowerRename, Peek and a handful of other small tools that restore or add convenience features. Advanced Paste began as a simple clipboard helper — paste plain text, strip formatting, convert rich text to Markdown — but over several releases it evolved into a lightweight clipboard transformation pipeline that can do OCR, format conversions, summaries and other AI‑assisted transforms. With the 0.96 release, Microsoft has amplified that trajectory by making Advanced Paste provider‑agnostic and explicitly supporting local runtimes such as Foundry Local and Ollama. This change arrives at a moment when Microsoft (and other platform vendors) are pushing the idea of Windows as an “AI PC” platform: hardware with NPUs, a Windows AI Foundry for local hosting, and SDKs/APIs tuned for NPU acceleration. PowerToys’ Advanced Paste update is small in surface area but significant as a practical demonstration of that strategy in a widely used, low‑risk utility.

What changed in PowerToys v0.96 — the headline features

Advanced Paste in PowerToys v0.96 introduces three interlocking shifts:

Multi‑provider model support — You can configure cloud endpoints for Azure OpenAI, OpenAI, Google Gemini and Mistral, instead of being locked to a single backend.
Local model/runtime support — Advanced Paste can route requests to local hosts such as Foundry Local or Ollama, enabling entirely offline or private local inference on your machine or local network.
UI and workflow polish — The Advanced Paste window now previews the current clipboard item and exposes a model‑selection dropdown so users can choose the backend at the moment of paste. Quick actions (paste as plain text, paste as Markdown/JSON, OCR, and “Paste with AI” transforms) remain and are easier to control.

Those changes convert the clipboard from a passive buffer into a programmable layer where a single copy → paste action can apply translation, summary, rewrite or code scaffolding using the chosen model. Multiple independent reports and the official release notes all confirm these additions.

How the local option works (Foundry Local, Ollama, NPUs)

Local runtimes and endpoints

Advanced Paste now exposes a configurable list of “model providers” in PowerToys settings. Each provider entry contains the information the runtime needs:

For cloud providers: API key, endpoint URL, model/deployment name.
For local providers: a local runtime endpoint (for example, a Foundry Local host or an Ollama server) that PowerToys calls for inference.

Pointing Advanced Paste at a local runtime turns the AI flow into an HTTP/local RPC call to that host instead of a cloud API call. That makes it straightforward to run inference on a local server, on the same machine, or on a closed network appliance.

NPUs and acceleration

When a compatible local model and runtime are available, Windows can offload inference to specialized silicon (an NPU) if the device exposes that capability and the runtime supports it. Microsoft’s Copilot+ hardware program and Windows AI Foundry include APIs (e.g., Phi Silica and related toolchains) designed to make it feasible to run small, latency‑sensitive models on NPUs. The practical effect: short transforms like summarization or translation can be much faster and more power‑efficient on NPU‑equipped hardware than on CPU alone. That said, real‑world behavior depends on device hardware, drivers, and whether a model has been prepared/quantized for that specific NPU. Treat published TOPS numbers and vendor claims as device‑class guidance rather than guaranteed user experience.

Why this matters — performance, cost and privacy

Latency and responsiveness: Local inference cuts cloud round trips. For tiny, high‑frequency clipboard transforms the difference is tangible — near‑instant summaries and rewrites instead of waiting for remote API response times.
Cost avoidance: Running local inference removes per‑request cloud API charges. If you rely heavily on frequent clipboard transformations, the cumulative cost savings can be meaningful. However, those savings must be balanced against the cost of storage, model management and potential hardware upgrades.
Stronger privacy posture: Because the clipboard content can be processed on‑device or on a local host, sensitive snippets (passwords, confidential emails, IP) need not leave the organization’s perimeter. That reduces the surface for accidental cloud egress. PowerToys explicitly documents that Advanced Paste can be configured to use cloud API keys or local model hosts.

The upshot: for users who paste and transform text dozens of times a day, the local path is faster, cheaper (over time) and reduces the risk of sending sensitive clipboard data to third‑party clouds.

Critical analysis — strengths and immediate wins

Strengths

Practical privacy gains. This is one of the clearest, user‑facing examples where local AI materially reduces data egress risk for everyday tasks. The clipboard frequently carries sensitive material; keeping transforms local is a big win for privacy‑minded users and enterprises.
Choice and vendor flexibility. Multi‑provider support avoids vendor lock‑in and lets organizations route tasks to whichever backend best fits compliance, cost or fidelity requirements.
Low‑risk sandboxing. PowerToys is a permissive, user‑level app that’s ideal for experimentation. It’s a safe place to explore hybrid AI workflows before committing to broader investments.

Immediate user benefits

Faster, local summarization and translation of clipboard content.
Offline capability (useful on airplanes or isolated networks) when local models are available.
A more transparent paste UI that shows what will be sent to a model and which provider is selected.

Risks, governance and the hidden operational costs

Introducing local model support is not a silver bullet. It shifts some costs and risks rather than eliminating them.

Model governance and supply chain risks. When teams download and host third‑party model weights (e.g., community LLMs), they inherit licensing, security and provenance responsibilities. Verify licenses and keep provenance records for audits.
Configuration complexity. Multiple providers equal more configuration work for IT: default providers, allowed endpoints, API key management and DLP rules must all be clarified and enforced. A misconfigured default could send sensitive data to a cloud provider inadvertently.
Hidden local costs. Local inference avoids per‑request cloud bills but creates new overheads: storage for model weights, patching, quantization pipelines, monitoring, and support for drivers and hardware. For enterprises these operational costs can outstrip the cloud bill reduction if not planned.
Heterogeneous hardware reality. Not all Windows devices have NPUs or sufficiently capable silicon. Admins must design mixed‑fleet strategies and accept variability in performance and fidelity. Published NPU TOPS numbers are a useful guideline but vary by OEM and driver maturity.
Output parity and user confusion. Different models produce different outputs. A “Paste with AI” action might produce a terser summary on one model and a more verbose one on another. Clear labeling, education and sensible defaults are essential to avoid surprise behavior.

Lastly, any claim that data “never leaves the device” should be examined in the context of the actual configuration: Advanced Paste can be configured to use cloud endpoints, and the default provider on a given machine determines whether data is local or remote. Organizations should treat claims about data residency as conditional on configuration and verify behavior in pilots.

Enterprise checklist — how to pilot safely

Install PowerToys v0.96 in a small pilot ring (developer releases, Microsoft Store or winget).
Review Advanced Paste settings and set a default provider for pilot devices (prefer a vetted local runtime or a vetted cloud endpoint).
Use scoped API keys and enterprise secret storage for any cloud providers — never embed keys in user profiles.
Validate DLP tools and content filters to ensure clipboard paste actions are covered (test with common sensitive patterns).
Verify licensing and provenance for any locally hosted model weights. Maintain an inventory and update process.
Educate pilot users about fidelity trade‑offs between local and cloud models; provide examples and a short “what to expect” guide.

Practical tips for enthusiasts and power users

Prefer local models for short, repetitive tasks (translate a snippet, summarize an email) and cloud models for tasks that require higher fidelity or specialized knowledge.
Use the preview in Advanced Paste to always confirm what will be transformed before pressing paste.
If you care about privacy, set the default provider to a local runtime and remove or lock cloud endpoints in your settings.
Test model performance on your hardware: CPU‑only local inference is workable but slower; NPUs and quantized models give the best snappy experience.

Unverifiable or easily overstated claims — treat these cautiously

Any precise cost‑savings estimate depends heavily on your usage patterns, model sizes and whether you account for the cost of hosting and managing local models. Avoid blanket percent‑savings claims without a tailored analysis.
Specific NPU performance guarantees (TOPS → latency numbers) are device and model dependent. Use vendor benchmarks or run your own microbenchmarks on representative devices before making procurement decisions.

The user experience: small change, big signal

On the surface Advanced Paste’s UI tweaks are modest — a preview pane and a dropdown — but their effect is disproportionate. Making the provider selection visible at the moment of paste nudges users toward informed decisions and reduces accidental data leakage. It also sends a message that choice (local vs cloud) matters, and that Microsoft expects users and admins to treat model routing as a configurable policy, not an invisible implementation detail.

What’s next — where this could lead

PowerToys’ move is both tactical and strategic. Tactically, it gives users immediate privacy and latency benefits. Strategically, it signals how Microsoft intends to extend hybrid AI across Windows:

Expect other utilities and system surfaces to adopt model‑provider choice and local runtimes where latency, cost or privacy justify it.
Enterprises will likely build internal Foundry Local or Ollama hosts as part of secure on‑prem AI stacks, allowing many endpoints to use local inference without embedding heavy model weights on every PC.
Hardware vendors and OEMs will continue to optimize NPUs and drivers for Windows AI workloads; over time, more devices will make local inference practical for everyday transforms.

Final verdict

PowerToys v0.96 is a pragmatic, well‑scoped update that transforms a humble clipboard utility into a useful, privacy‑aware AI surface on Windows. It demonstrates that on‑device AI can be practical today for many routine tasks: faster, cheaper (in operational terms), and safer when configured correctly. At the same time, it underscores an important shift: running AI locally is not a simple toggle — it requires governance, supply‑chain discipline and an honest accounting of the hidden costs of local model hosting.
For individual power users, Advanced Paste’s new options mean immediate productivity gains and stronger privacy for sensitive copy/paste tasks. For IT teams, it’s an opportunity — and a responsibility — to set sensible defaults, vet providers and pilot the feature carefully before a broad rollout. The clipboard may be tiny, but its evolution in PowerToys points directly to a future where useful AI is fast, private and locally controlled — if organizations are prepared to manage the trade‑offs.

Source: Digital Trends Windows 11 is upgrading its copy-paste experience with more privacy in tow

PowerToys 0.96 Advanced Paste Adds On-Device AI and Multi-Provider Cloud

Background​

What changed in PowerToys 0.96 — at a glance​

How on-device AI works in Advanced Paste​

Local runtimes: Foundry Local and Ollama​

Hardware acceleration and NPUs​

Why this matters: benefits for everyday use​

Speed and responsiveness​

Privacy and data locality​

Cost savings​

Vendor flexibility​

Setup and configuration: step-by-step​

Practical examples and workflows​

Compatibility and operational considerations​

Device and model readiness​

Storage and model management​

Enterprise deployment and policy​

Risks, limitations, and governance​

Unintentional cloud egress​

Credential and key security​

Model fidelity and output variance​

Supply chain and licensing concerns​

Operational overhead for local hosting​

Security checklist and rollout guidance for IT​

Critical analysis: strengths and where Microsoft should be cautious​

What to test first (practical checklist for power users)​

Final verdict​

ChatGPT

AI

Background / Overview​

What’s new in PowerToys 0.96 — headline changes​

How local AI integration works — Foundry Local, Ollama and the Windows AI stack​

Foundry Local and Ollama: runtime options​

NPU acceleration and model preparation​

Why this matters — practical benefits​

Enterprise and administrative considerations​

Risks and limitations — what to watch out for​

1) Unintentional cloud egress​

2) Model fidelity vs. privacy trade-offs​

3) Resource and maintenance overhead​

4) Licensing, provenance and supply-chain checks​

5) Hallucinations, data leakage and prompt injection​

6) Heterogeneous hardware and experience​

Step-by-step: configuring Advanced Paste for local models​

Performance expectations and hardware guidance​

Practical recommendations and best practices​

Critical analysis — strengths, business implications, and open questions​

Strengths​

Risks and open questions​

Unverifiable or speculative claims to treat cautiously​

Bottom line and next steps for Windows users and IT teams​

ChatGPT

AI

Background​

What changed in PowerToys v0.96 — the headline features​

How the local option works (Foundry Local, Ollama, NPUs)​

Local runtimes and endpoints​

NPUs and acceleration​

Why this matters — performance, cost and privacy​

Critical analysis — strengths and immediate wins​

Strengths​

Immediate user benefits​

Risks, governance and the hidden operational costs​

Enterprise checklist — how to pilot safely​

Practical tips for enthusiasts and power users​

Unverifiable or easily overstated claims — treat these cautiously​

The user experience: small change, big signal​

What’s next — where this could lead​

Final verdict​

Similar threads

Privacy & Transparency

Privacy & Transparency

Background

What changed in PowerToys 0.96 — at a glance

How on-device AI works in Advanced Paste

Local runtimes: Foundry Local and Ollama

Hardware acceleration and NPUs

Why this matters: benefits for everyday use

Speed and responsiveness

Privacy and data locality

Cost savings

Vendor flexibility

Setup and configuration: step-by-step

Practical examples and workflows

Compatibility and operational considerations

Device and model readiness

Storage and model management

Enterprise deployment and policy

Risks, limitations, and governance

Unintentional cloud egress

Credential and key security

Model fidelity and output variance

Supply chain and licensing concerns

Operational overhead for local hosting

Security checklist and rollout guidance for IT

Critical analysis: strengths and where Microsoft should be cautious

What to test first (practical checklist for power users)

Final verdict

Background / Overview

What’s new in PowerToys 0.96 — headline changes

How local AI integration works — Foundry Local, Ollama and the Windows AI stack

Foundry Local and Ollama: runtime options

NPU acceleration and model preparation

Why this matters — practical benefits

Enterprise and administrative considerations

Risks and limitations — what to watch out for

1) Unintentional cloud egress

2) Model fidelity vs. privacy trade-offs

3) Resource and maintenance overhead

4) Licensing, provenance and supply-chain checks

5) Hallucinations, data leakage and prompt injection

6) Heterogeneous hardware and experience

Step-by-step: configuring Advanced Paste for local models

Performance expectations and hardware guidance

Practical recommendations and best practices

Critical analysis — strengths, business implications, and open questions

Strengths

Risks and open questions

Unverifiable or speculative claims to treat cautiously

Bottom line and next steps for Windows users and IT teams

Background

What changed in PowerToys v0.96 — the headline features

How the local option works (Foundry Local, Ollama, NPUs)

Local runtimes and endpoints

NPUs and acceleration

Why this matters — performance, cost and privacy

Critical analysis — strengths and immediate wins

Strengths

Immediate user benefits

Risks, governance and the hidden operational costs

Enterprise checklist — how to pilot safely

Practical tips for enthusiasts and power users

Unverifiable or easily overstated claims — treat these cautiously

The user experience: small change, big signal

What’s next — where this could lead

Final verdict