Microsoft’s Click To Do is the kind of small change that can quietly rearrange how you work: a system-level overlay that turns whatever is on your screen—text, images, even tables—into actionable items you can edit, summarize, or hand off to Copilot, all without leaving the context of your desktop. (pcworld.com)
Click To Do began as an offshoot of Microsoft’s earlier projects to make screen content more useful—tools like Windows Recall and expanded screenshot utilities—but it has grown into a standalone, OS-level convenience feature intended to be a fast path to AI-driven actions on Windows 11. The feature is presented as a Copilot-native tool: you summon Click To Do with a shortcut (Windows key + mouse click, Windows + Q, or gestures on touch devices) and the OS captures a screenshot, runs optical character recognition (OCR) and image analysis, and presents context-sensitive actions for the elements it finds. (pcworld.com)
Microsoft positions Click To Do as both pragmatic and privacy-aware: most of the feature’s transformations—like text summarization and rewriting—are handled locally by an on-device small language model called Phi Silica that uses the machine’s NPU (Neural Processing Unit). Actions that require the internet—visual web search, some Copilot chat features, or opening live web results—are explicitly routed out to Microsoft services only when the user chooses those options. Microsoft’s documentation states that temporary files may be created during processing but that Click To Do does not persistently retain screen content on the device. (support.microsoft.com)
That hardware dependency has real-world consequences: while Click To Do’s basic screenshot-and-select actions may appear on more machines, the local Phi Silica-powered text transformations and the lowest-latency experiences are exclusive to Copilot+ PCs that meet the NPU performance bar. Microsoft has said support for other Copilot+ certified silicon families will arrive, but rollout timing has been staged by hardware vendor and region. (theverge.com)
This aligns with other Microsoft moves—semantic local search, Recall (in its revised form), and deeper Copilot integration inside File Explorer and system UI—to make AI a first-class element of the OS rather than an add-on app. The risk is that hardware gating slows adoption and that local models will need steady improvement to be perceived as genuinely useful rather than novelty. (windowscentral.com)
Strengths:
Conclusion: Click To Do demonstrates how a well-integrated, context-aware overlay can make AI feel like part of the desktop rather than a separate novelty app. Its success will depend on Microsoft’s ability to broaden hardware support, improve local model capability over time, and sustain transparent privacy practices that reassure both casual users and organizations. If those pieces come together, Click To Do will likely become a quiet but indispensable tool in the Windows workflow. (support.microsoft.com) (windowscentral.com)
Source: PCWorld What is Click To Do? Meet Microsoft's next AI headliner for Windows PCs
Background / Overview
Click To Do began as an offshoot of Microsoft’s earlier projects to make screen content more useful—tools like Windows Recall and expanded screenshot utilities—but it has grown into a standalone, OS-level convenience feature intended to be a fast path to AI-driven actions on Windows 11. The feature is presented as a Copilot-native tool: you summon Click To Do with a shortcut (Windows key + mouse click, Windows + Q, or gestures on touch devices) and the OS captures a screenshot, runs optical character recognition (OCR) and image analysis, and presents context-sensitive actions for the elements it finds. (pcworld.com)Microsoft positions Click To Do as both pragmatic and privacy-aware: most of the feature’s transformations—like text summarization and rewriting—are handled locally by an on-device small language model called Phi Silica that uses the machine’s NPU (Neural Processing Unit). Actions that require the internet—visual web search, some Copilot chat features, or opening live web results—are explicitly routed out to Microsoft services only when the user chooses those options. Microsoft’s documentation states that temporary files may be created during processing but that Click To Do does not persistently retain screen content on the device. (support.microsoft.com)
How Click To Do works — the mechanics under the hood
Screen capture + OCR + local model inference
At a high level, Click To Do flows through three steps:- Capture: when invoked, Windows takes a screenshot of the current display area.
- Analyze: the feature applies OCR and vision models to detect text, images, tables, URLs, email addresses, and other entities.
- Act: a contextual menu exposes actions—copy, open link, send email, summarize, rewrite, visual search, and app-specific image edits—some executed locally, some sent to cloud services per the user’s choice. (pcworld.com)
When the cloud is used
Click To Do explicitly differentiates between on-device and cloud actions. Actions such as Search the web, Visual search with Bing, and Ask Copilot send selected content to online services. The user chooses these actions; they are not performed silently. When such an action is selected, the content is sent to the appropriate server or web provider, and the results are returned in the default browser or within Copilot as appropriate. Microsoft has documented these flows and the temporary storage practices involved. (support.microsoft.com)What Click To Do can do today
Microsoft has expanded the Click To Do menu with a variety of actions that fall into two broad buckets: text actions and image/visual actions.Text actions (local + cloud)
- Copy / Extract text from a screenshot via OCR and drop it into the clipboard.
- Open website for recognized URLs (opens in your default browser).
- Send email for recognized email addresses (composes in your default mail client).
- Summarize selected text (powered by Phi Silica locally on Copilot+ PCs).
- Rewrite in multiple tones (casual, formal, refine) executed locally by Phi Silica.
- Create bulleted lists, shorten or expand content, and other small transformations. (pcworld.com) (support.microsoft.com)
Image and visual actions
- Visual Search with Bing (cloud): identify objects, find similar items, or pull web results for items in an image.
- Blur background / Erase objects / Remove background: application-driven edits that call into Photos, Paint, or other installed apps to perform AI-assisted edits on the selected image region.
- Open with… / Save / Share: standard file actions surfaced within the overlay for faster handoffs.
Hardware and compatibility: Copilot+ PCs and the NPU requirement
Click To Do’s more advanced, local AI capabilities are closely tied to Microsoft’s Copilot+ PC program. A Copilot+ PC is a device certified to host certain on-device AI experiences by meeting a minimum hardware baseline—most importantly a capable NPU. Documentation and ecosystem reporting have highlighted a multi-vendor approach (Qualcomm Snapdragon X series, Intel Core Ultra with NPUs, AMD Ryzen AI series), and OEMs ship PCs that meet the 40 TOPS-class NPU performance needed for reliable local inference workloads. Initially, some features rolled on Snapdragon-driven devices, with Intel and AMD machines following as vendor drivers and firmware matured. (windowscentral.com)That hardware dependency has real-world consequences: while Click To Do’s basic screenshot-and-select actions may appear on more machines, the local Phi Silica-powered text transformations and the lowest-latency experiences are exclusive to Copilot+ PCs that meet the NPU performance bar. Microsoft has said support for other Copilot+ certified silicon families will arrive, but rollout timing has been staged by hardware vendor and region. (theverge.com)
Privacy and security: promises, safeguards, and caveats
Click To Do has been designed with privacy controls and an explicit “you choose to send” model for cloud interactions. Key points from Microsoft’s guidance:- Local-first processing: Phi Silica runs on-device on Copilot+ PCs; OCR and many transformations happen locally without leaving the machine. Temporary files may be created in the user’s local temp directory during an action but are not stored permanently by Click To Do. (support.microsoft.com)
- User-driven cloud actions: Web searches, Bing Visual Search, and "Ask Copilot" send user-selected content to cloud services only when the user invokes those actions. Microsoft documents this behavior and identifies which actions require internet communication. (support.microsoft.com)
- Diagnostic data and responsible AI checks: Microsoft says Click To Do’s models underwent fairness, security, and privacy assessments and that telemetry used for diagnostics is limited—though specifics on telemetry schemas are not provided in the consumer-facing documentation. (support.microsoft.com)
- While Microsoft emphasizes local processing, the line between local and cloud can blur depending on which action is chosen; some users may not realize a particular menu item performs an external request.
- Enterprise and organizational deployments route cloud actions to Microsoft 365 Copilot (with organizational privacy policies), which differs from consumer Copilot behavior. Administrators should audit and educate users about those differences. (support.microsoft.com)
Real-world UX: how Click To Do changes (or doesn’t change) daily work
Click To Do is a productivity design play: reduce app switching, cut repetitive typing, and surface the right action at the moment of need. The overlay is useful in three scenarios:- Fast triage of screen content: get a quick summary of a long email or webpage paragraph without opening a separate app.
- Visual identification and shopping support: spot an item in a webpage and jump to Bing Visual Search in one action.
- Rapid image cleanup: remove background clutter or blur a background directly from the overlay and pass edits to Photos or Paint.
- Accuracy of OCR — performance is strong on clean type, but less reliable with stylized fonts, blurry screenshots, or composite web elements. Users report that copy-and-paste still often feels faster for precise extraction tasks. (pcworld.com)
- Model quality trade-offs — Phi Silica is optimized for small size and low latency. That’s great for responsiveness, but it’s not a replacement for cloud LLMs when you need deep reasoning, extensive context, or very long rewrites. Expect better latency but narrower capabilities compared with cloud-hosted Copilot chat experiences. (pcworld.com)
- Discoverability vs. interference — the Windows key + click shortcut is convenient for many users but can trigger accidentally (especially during gaming). Microsoft provides a global setting to disable Click To Do. (support.microsoft.com)
Limitations, risks and where Microsoft still needs to improve
1. Local model limitations
Phi Silica is intentionally small. That design trades raw capability for speed and privacy. For complicated editorial tasks, or when you need comprehensive research-like summarization, cloud LLMs remain superior. Users will notice the difference when switching between local Click To Do rewrites and Copilot’s cloud chat responses. (pcworld.com)2. Fragmented rollout and hardware gating
Because the full Click To Do experience with on-device AI is gated to Copilot+ hardware, many users will not see the complete capabilities. That fragmentation risks a two-tiered experience where only buyers of premium, NPU-equipped laptops get the full promise. Microsoft has indicated wider hardware support is planned, but the staggered cadence increases confusion. (windowscentral.com)3. Privacy perceptions and enterprise policy complexity
Microsoft’s documentation clarifies when cloud calls happen, but user perception can lag behind. Enterprises must understand the distinction between consumer Copilot and Microsoft 365 Copilot, and admins should plan policy, training, and possibly feature gating to align privacy and compliance needs. (support.microsoft.com)4. Edge cases and accessibility
OCR and image analyses are imperfect. Accessibility gains are possible (for example, quick text extraction for screen readers), but Microsoft needs to ensure the overlay plays nicely with assistive technologies, which is an ongoing engineering challenge. Recent Insider notes show Microsoft continues to refine Narrator and other assistive features around Copilot visibility, but full parity requires sustained attention.How to enable, disable, and use Click To Do (practical steps)
- Ensure you are on a Copilot+ PC and updated Windows 11 build (Insider Dev/Beta builds show features earlier; stable channels get staged rollouts).
- To invoke Click To Do:
- Press and hold the Windows key and click your left mouse button, or
- Press Windows + Q, or
- Use the Snipping Tool or Print Screen shortcuts if configured. (pcworld.com)
- Select the text or image region you want to act on. The overlay will highlight recognized elements and surface suggested actions.
- Choose a local action (Summarize, Rewrite) to use Phi Silica on-device, or pick a cloud action (Search the web, Visual search) to route content to Microsoft services. (support.microsoft.com)
- To disable Click To Do globally: go to Settings > Privacy & security > Click to Do and flip the switch to Off. This prevents the shortcuts and gestures from launching the overlay. (support.microsoft.com)
- Use high-contrast, legible text for more accurate OCR.
- For longer rewrites or deep edits, consider sending content to Copilot or using cloud editors for better contextual understanding.
- If privacy is paramount, avoid cloud actions and stick to on-device Phi Silica transformations; review enterprise policies if your device is managed by an organization. (support.microsoft.com)
Strategic context: why Click To Do matters for Windows and Microsoft’s AI strategy
Click To Do is more than a convenience: it’s a visible demonstration of Microsoft’s dual-track AI strategy—local-first AI for latency-sensitive tasks and cloud-backed AI for heavier lifting. By shipping Phi Silica on Copilot+ PCs and making it the default for small, fast transformations, Microsoft aims to normalize quick, low-friction AI touches in everyday workflows while preserving the cloud for broader questions. That architecture is a bet on heterogeneous computing: NPUs in client devices will reduce round trips, improve privacy for many tasks, and open new UX patterns for context-aware assistants. (support.microsoft.com) (windowscentral.com)This aligns with other Microsoft moves—semantic local search, Recall (in its revised form), and deeper Copilot integration inside File Explorer and system UI—to make AI a first-class element of the OS rather than an add-on app. The risk is that hardware gating slows adoption and that local models will need steady improvement to be perceived as genuinely useful rather than novelty. (windowscentral.com)
Final analysis and verdict
Click To Do is one of the more coherent, pragmatic AI features Microsoft has introduced into Windows: it’s immediate, discoverable, and solves small but persistent friction points. For users on Copilot+ hardware it can reduce micromanagement—summaries, rewrites, and image fixes become micro-tasks you perform without changing apps. For those on non‑Copilot+ devices, Click To Do will feel like a preview of what’s to come but may lack the responsiveness and local model options that make the feature shine. (pcworld.com)Strengths:
- Convenience: one overlay for multiple actions reduces context switching.
- Local-first privacy model: Phi Silica keeps many transformations on-device.
- Broad feature set: text, image edits, and Copilot handoffs are available from a single interface. (support.microsoft.com)
- Hardware gating fragments the experience and favors recent, premium devices.
- Local model limitations mean users will still need cloud LLMs for complex tasks. (pcworld.com)
- Perception and policy around privacy and telemetry will require continued clarity from Microsoft, particularly in regulated enterprise settings. (support.microsoft.com)
Conclusion: Click To Do demonstrates how a well-integrated, context-aware overlay can make AI feel like part of the desktop rather than a separate novelty app. Its success will depend on Microsoft’s ability to broaden hardware support, improve local model capability over time, and sustain transparent privacy practices that reassure both casual users and organizations. If those pieces come together, Click To Do will likely become a quiet but indispensable tool in the Windows workflow. (support.microsoft.com) (windowscentral.com)
Source: PCWorld What is Click To Do? Meet Microsoft's next AI headliner for Windows PCs