• Thread Author
Microsoft’s Click To Do is the kind of small change that can quietly rearrange how you work: a system-level overlay that turns whatever is on your screen—text, images, even tables—into actionable items you can edit, summarize, or hand off to Copilot, all without leaving the context of your desktop. (pcworld.com)

A laptop screen displays neon holographic UI for on-device AI with cloud actions.Background / Overview​

Click To Do began as an offshoot of Microsoft’s earlier projects to make screen content more useful—tools like Windows Recall and expanded screenshot utilities—but it has grown into a standalone, OS-level convenience feature intended to be a fast path to AI-driven actions on Windows 11. The feature is presented as a Copilot-native tool: you summon Click To Do with a shortcut (Windows key + mouse click, Windows + Q, or gestures on touch devices) and the OS captures a screenshot, runs optical character recognition (OCR) and image analysis, and presents context-sensitive actions for the elements it finds. (pcworld.com)
Microsoft positions Click To Do as both pragmatic and privacy-aware: most of the feature’s transformations—like text summarization and rewriting—are handled locally by an on-device small language model called Phi Silica that uses the machine’s NPU (Neural Processing Unit). Actions that require the internet—visual web search, some Copilot chat features, or opening live web results—are explicitly routed out to Microsoft services only when the user chooses those options. Microsoft’s documentation states that temporary files may be created during processing but that Click To Do does not persistently retain screen content on the device. (support.microsoft.com)

How Click To Do works — the mechanics under the hood​

Screen capture + OCR + local model inference​

At a high level, Click To Do flows through three steps:
  • Capture: when invoked, Windows takes a screenshot of the current display area.
  • Analyze: the feature applies OCR and vision models to detect text, images, tables, URLs, email addresses, and other entities.
  • Act: a contextual menu exposes actions—copy, open link, send email, summarize, rewrite, visual search, and app-specific image edits—some executed locally, some sent to cloud services per the user’s choice. (pcworld.com)
The intelligent text actions—for example, Summarize, Create a bulleted list, and Rewrite—are powered by Phi Silica, which Microsoft describes as a compact, efficient small language model (SLM) optimized to run on Copilot+ PCs’ NPU and designed to stream responses locally with low latency and modest power use. This arrangement preserves responsiveness and keeps routine transformations on-device. Microsoft’s support documentation specifically calls out Phi Silica as the engine for these localized text actions. (support.microsoft.com)

When the cloud is used​

Click To Do explicitly differentiates between on-device and cloud actions. Actions such as Search the web, Visual search with Bing, and Ask Copilot send selected content to online services. The user chooses these actions; they are not performed silently. When such an action is selected, the content is sent to the appropriate server or web provider, and the results are returned in the default browser or within Copilot as appropriate. Microsoft has documented these flows and the temporary storage practices involved. (support.microsoft.com)

What Click To Do can do today​

Microsoft has expanded the Click To Do menu with a variety of actions that fall into two broad buckets: text actions and image/visual actions.

Text actions (local + cloud)​

  • Copy / Extract text from a screenshot via OCR and drop it into the clipboard.
  • Open website for recognized URLs (opens in your default browser).
  • Send email for recognized email addresses (composes in your default mail client).
  • Summarize selected text (powered by Phi Silica locally on Copilot+ PCs).
  • Rewrite in multiple tones (casual, formal, refine) executed locally by Phi Silica.
  • Create bulleted lists, shorten or expand content, and other small transformations. (pcworld.com) (support.microsoft.com)
When selection exceeds a threshold (many outlets reference “over 10 words” as the point where Phi Silica actions appear), the Click To Do menu surfaces these AI actions. That threshold is a practical throttle: on-device small models have limited prompt/context windows compared with cloud-hosted large language models (LLMs), so Microsoft steers users toward smaller, high-value snippets for local inference. (pcworld.com)

Image and visual actions​

  • Visual Search with Bing (cloud): identify objects, find similar items, or pull web results for items in an image.
  • Blur background / Erase objects / Remove background: application-driven edits that call into Photos, Paint, or other installed apps to perform AI-assisted edits on the selected image region.
  • Open with… / Save / Share: standard file actions surfaced within the overlay for faster handoffs.
These visual edits use the same image processing capabilities Microsoft has been pushing into Photos and Paint—now accessible from a universal overlay so you don’t have to open the app first. That parity is deliberate: Click To Do bundles scattered AI functions into a single, discoverable action surface.

Hardware and compatibility: Copilot+ PCs and the NPU requirement​

Click To Do’s more advanced, local AI capabilities are closely tied to Microsoft’s Copilot+ PC program. A Copilot+ PC is a device certified to host certain on-device AI experiences by meeting a minimum hardware baseline—most importantly a capable NPU. Documentation and ecosystem reporting have highlighted a multi-vendor approach (Qualcomm Snapdragon X series, Intel Core Ultra with NPUs, AMD Ryzen AI series), and OEMs ship PCs that meet the 40 TOPS-class NPU performance needed for reliable local inference workloads. Initially, some features rolled on Snapdragon-driven devices, with Intel and AMD machines following as vendor drivers and firmware matured. (windowscentral.com)
That hardware dependency has real-world consequences: while Click To Do’s basic screenshot-and-select actions may appear on more machines, the local Phi Silica-powered text transformations and the lowest-latency experiences are exclusive to Copilot+ PCs that meet the NPU performance bar. Microsoft has said support for other Copilot+ certified silicon families will arrive, but rollout timing has been staged by hardware vendor and region. (theverge.com)

Privacy and security: promises, safeguards, and caveats​

Click To Do has been designed with privacy controls and an explicit “you choose to send” model for cloud interactions. Key points from Microsoft’s guidance:
  • Local-first processing: Phi Silica runs on-device on Copilot+ PCs; OCR and many transformations happen locally without leaving the machine. Temporary files may be created in the user’s local temp directory during an action but are not stored permanently by Click To Do. (support.microsoft.com)
  • User-driven cloud actions: Web searches, Bing Visual Search, and "Ask Copilot" send user-selected content to cloud services only when the user invokes those actions. Microsoft documents this behavior and identifies which actions require internet communication. (support.microsoft.com)
  • Diagnostic data and responsible AI checks: Microsoft says Click To Do’s models underwent fairness, security, and privacy assessments and that telemetry used for diagnostics is limited—though specifics on telemetry schemas are not provided in the consumer-facing documentation. (support.microsoft.com)
Caveats and practical privacy concerns:
  • While Microsoft emphasizes local processing, the line between local and cloud can blur depending on which action is chosen; some users may not realize a particular menu item performs an external request.
  • Enterprise and organizational deployments route cloud actions to Microsoft 365 Copilot (with organizational privacy policies), which differs from consumer Copilot behavior. Administrators should audit and educate users about those differences. (support.microsoft.com)
Where privacy critics have been vocal is in the historical context: features that analyze or snapshot screen content can create anxiety about always-on surveillance. Microsoft has attempted to reduce this risk by requiring explicit invocation, documenting data flows, and enabling disable switches, but skepticism remains in some quarters. Independent reporting and community discussions show that transparency will remain critical for adoption among privacy-conscious users. (pcworld.com)

Real-world UX: how Click To Do changes (or doesn’t change) daily work​

Click To Do is a productivity design play: reduce app switching, cut repetitive typing, and surface the right action at the moment of need. The overlay is useful in three scenarios:
  • Fast triage of screen content: get a quick summary of a long email or webpage paragraph without opening a separate app.
  • Visual identification and shopping support: spot an item in a webpage and jump to Bing Visual Search in one action.
  • Rapid image cleanup: remove background clutter or blur a background directly from the overlay and pass edits to Photos or Paint.
But real adoption hinges on a few practical factors:
  • Accuracy of OCR — performance is strong on clean type, but less reliable with stylized fonts, blurry screenshots, or composite web elements. Users report that copy-and-paste still often feels faster for precise extraction tasks. (pcworld.com)
  • Model quality trade-offs — Phi Silica is optimized for small size and low latency. That’s great for responsiveness, but it’s not a replacement for cloud LLMs when you need deep reasoning, extensive context, or very long rewrites. Expect better latency but narrower capabilities compared with cloud-hosted Copilot chat experiences. (pcworld.com)
  • Discoverability vs. interference — the Windows key + click shortcut is convenient for many users but can trigger accidentally (especially during gaming). Microsoft provides a global setting to disable Click To Do. (support.microsoft.com)

Limitations, risks and where Microsoft still needs to improve​

1. Local model limitations​

Phi Silica is intentionally small. That design trades raw capability for speed and privacy. For complicated editorial tasks, or when you need comprehensive research-like summarization, cloud LLMs remain superior. Users will notice the difference when switching between local Click To Do rewrites and Copilot’s cloud chat responses. (pcworld.com)

2. Fragmented rollout and hardware gating​

Because the full Click To Do experience with on-device AI is gated to Copilot+ hardware, many users will not see the complete capabilities. That fragmentation risks a two-tiered experience where only buyers of premium, NPU-equipped laptops get the full promise. Microsoft has indicated wider hardware support is planned, but the staggered cadence increases confusion. (windowscentral.com)

3. Privacy perceptions and enterprise policy complexity​

Microsoft’s documentation clarifies when cloud calls happen, but user perception can lag behind. Enterprises must understand the distinction between consumer Copilot and Microsoft 365 Copilot, and admins should plan policy, training, and possibly feature gating to align privacy and compliance needs. (support.microsoft.com)

4. Edge cases and accessibility​

OCR and image analyses are imperfect. Accessibility gains are possible (for example, quick text extraction for screen readers), but Microsoft needs to ensure the overlay plays nicely with assistive technologies, which is an ongoing engineering challenge. Recent Insider notes show Microsoft continues to refine Narrator and other assistive features around Copilot visibility, but full parity requires sustained attention.

How to enable, disable, and use Click To Do (practical steps)​

  • Ensure you are on a Copilot+ PC and updated Windows 11 build (Insider Dev/Beta builds show features earlier; stable channels get staged rollouts).
  • To invoke Click To Do:
  • Press and hold the Windows key and click your left mouse button, or
  • Press Windows + Q, or
  • Use the Snipping Tool or Print Screen shortcuts if configured. (pcworld.com)
  • Select the text or image region you want to act on. The overlay will highlight recognized elements and surface suggested actions.
  • Choose a local action (Summarize, Rewrite) to use Phi Silica on-device, or pick a cloud action (Search the web, Visual search) to route content to Microsoft services. (support.microsoft.com)
  • To disable Click To Do globally: go to Settings > Privacy & security > Click to Do and flip the switch to Off. This prevents the shortcuts and gestures from launching the overlay. (support.microsoft.com)
Tips for best results:
  • Use high-contrast, legible text for more accurate OCR.
  • For longer rewrites or deep edits, consider sending content to Copilot or using cloud editors for better contextual understanding.
  • If privacy is paramount, avoid cloud actions and stick to on-device Phi Silica transformations; review enterprise policies if your device is managed by an organization. (support.microsoft.com)

Strategic context: why Click To Do matters for Windows and Microsoft’s AI strategy​

Click To Do is more than a convenience: it’s a visible demonstration of Microsoft’s dual-track AI strategy—local-first AI for latency-sensitive tasks and cloud-backed AI for heavier lifting. By shipping Phi Silica on Copilot+ PCs and making it the default for small, fast transformations, Microsoft aims to normalize quick, low-friction AI touches in everyday workflows while preserving the cloud for broader questions. That architecture is a bet on heterogeneous computing: NPUs in client devices will reduce round trips, improve privacy for many tasks, and open new UX patterns for context-aware assistants. (support.microsoft.com) (windowscentral.com)
This aligns with other Microsoft moves—semantic local search, Recall (in its revised form), and deeper Copilot integration inside File Explorer and system UI—to make AI a first-class element of the OS rather than an add-on app. The risk is that hardware gating slows adoption and that local models will need steady improvement to be perceived as genuinely useful rather than novelty. (windowscentral.com)

Final analysis and verdict​

Click To Do is one of the more coherent, pragmatic AI features Microsoft has introduced into Windows: it’s immediate, discoverable, and solves small but persistent friction points. For users on Copilot+ hardware it can reduce micromanagement—summaries, rewrites, and image fixes become micro-tasks you perform without changing apps. For those on non‑Copilot+ devices, Click To Do will feel like a preview of what’s to come but may lack the responsiveness and local model options that make the feature shine. (pcworld.com)
Strengths:
  • Convenience: one overlay for multiple actions reduces context switching.
  • Local-first privacy model: Phi Silica keeps many transformations on-device.
  • Broad feature set: text, image edits, and Copilot handoffs are available from a single interface. (support.microsoft.com)
Weaknesses and risks:
  • Hardware gating fragments the experience and favors recent, premium devices.
  • Local model limitations mean users will still need cloud LLMs for complex tasks. (pcworld.com)
  • Perception and policy around privacy and telemetry will require continued clarity from Microsoft, particularly in regulated enterprise settings. (support.microsoft.com)
Click To Do is realistic and useful, not revolutionary. It’s the kind of incremental innovation that—if Microsoft continues to refine OCR accuracy, expand NPU support across vendors, and make the privacy model comforting for businesses and consumers—could become a productivity staple. For now, it’s an invitation to try a new way of interacting with your screen: fast, often local, and increasingly capable. (pcworld.com)

Conclusion: Click To Do demonstrates how a well-integrated, context-aware overlay can make AI feel like part of the desktop rather than a separate novelty app. Its success will depend on Microsoft’s ability to broaden hardware support, improve local model capability over time, and sustain transparent privacy practices that reassure both casual users and organizations. If those pieces come together, Click To Do will likely become a quiet but indispensable tool in the Windows workflow. (support.microsoft.com) (windowscentral.com)

Source: PCWorld What is Click To Do? Meet Microsoft's next AI headliner for Windows PCs
 

Back
Top