Copilot Vision Text In Text Out Arrives in Windows Insider Preview

  • Thread Author
Microsoft has begun rolling out a staged Insider preview that finally gives Copilot Vision a full text-in / text-out conversation path so Windows Insiders can type about what Copilot sees and receive text replies in the same chat pane.

Futuristic Copilot UI showing a glowing chat window beside a dark control panel.Background​

Copilot on Windows has been evolving from a sidebar curiosity into a system-level assistant with three interlocking modalities: Voice, Vision, and Actions. Vision initially launched as a voice-first, coached experience that allowed Copilot to analyze a shared window, speak guidance aloud, and in some previews visually highlight UI elements on screen. The new update introduces typed conversations for Vision sessions, broadening where and how the feature can be used.
Microsoft’s official Windows Insider announcement confirms the new capability and specifies the Copilot app package delivering it: Copilot app version 1.25103.107 and higher is being distributed via the Microsoft Store to all Insider channels as a staged rollout.

What changed: the headline features​

  • Text-in / text-out Vision — Start a Vision session by sharing an app or screen, type your questions in the Copilot chat composer, and receive text replies in the same conversation pane rather than spoken feedback.
  • Seamless modality switching — A live session can flip from typed text to voice: pressing the microphone transitions the conversation from text Vision into voice Vision without losing context.
  • Explicit sharing controls and UI feedback — The composer’s glasses (Vision) icon begins the flow; a visible glow confirms which window is being shared; Stop or X ends sharing. The feature intentionally uses permissioned, session-bound sharing so Copilot only sees what the user explicitly shares.
These additions move Vision from a constrained voice-first experiment to a truly multimodal assistant where typing is a first-class way to interact with shared visual context.

How to try it (Insider steps)​

  • Update the Copilot app via the Microsoft Store and confirm the Copilot app version is 1.25103.107 or higher.
  • Open the Copilot composer in the Copilot app (or from the taskbar quick view).
  • Click the Vision (glasses) icon.
  • Toggle off Start with voice to enter the text-in / text-out mode.
  • Select the app window or screen region to share; the shared area will glow to indicate it’s being shared.
  • Type questions and read Copilot’s textual replies in the same chat pane. Press the Voice (microphone) icon to switch to spoken interaction; use Stop (X) to end sharing.
These steps are verified by Microsoft’s Insider blog and corroborated by reporting from independent outlets covering the preview.

Why this matters: practical benefits​

  • Usability in quiet or public spaces. Voice-first interactions are awkward in meetings, shared offices, or on public transit; typed Vision preserves visual context while maintaining discretion.
  • Accessibility and preference. Users with speech impairments or those who prefer typing now get parity of experience when asking about screen content.
  • Productivity continuity. Typing keeps a searchable record directly in the chat pane and enables copy/paste/export flows more naturally than transient voice replies.
  • Flexible workflows. The ability to switch modalities mid-session means users can begin a text-based troubleshooting dialog and switch to voice for a hands-free follow-up without losing context.
These practical improvements make Vision more useful across everyday scenarios and for a wider set of users and environments.

What’s not included (preview limitations and conservatism)​

Microsoft is explicit that some previously demonstrated features are not available in the initial text-mode preview:
  • Visual Highlights — the overlays that visually point to UI elements on screen are not supported in this release. Microsoft says it is exploring how visual cues should integrate with typed conversation flows.
  • Staged availability — this is a server‑side gated rollout; not every Insider or region will see the update simultaneously. Expect uneven availability across channels and devices.
This cautious staging is intentional: Microsoft appears to be validating the permission model and typed flows before restoring feature parity with voice Vision or rolling out broadly.

Technical verification: facts, numbers, and claims checked​

  • The Copilot app package version tied to this preview is 1.25103.107 (minimum). This is confirmed directly by Microsoft’s Windows Insider blog and was repeated by independent reporting.
  • The rollout announcement date and preview details were published by Microsoft in an Insider blog post dated October 28, 2025; the same post describes the toggle flow, the visual glow indicator, and the lack of Highlights in the release. These items appear in Microsoft’s post and in contemporaneous tech coverage.
  • The share flow and UI affordances (glasses icon, Start with voice toggle, mic button, Stop/X control) are described consistently across Microsoft’s blogs and independent coverage, corroborating the user flow described in the update.
Where claims involve telemetry (for example, statements about engagement uplift from voice vs. text), those are Microsoft’s own metrics and are therefore flagged as vendor-sourced; independent verification would require third-party telemetry access and is currently unavailable. Treat such engagement claims as directional until independently verified.

Deep analysis: strengths and momentum​

Multimodality is the correct next step​

The shift to true multimodal interactions — where keyboard, voice, and visual context are equal first-class inputs — aligns with how people actually work. Many tasks require a mix of modalities: glance at a screen, type a focused question, then talk through next steps. The new text-in Vision acknowledges that voice cannot be the only or default path for visual queries, which is an important UX maturation.

Thoughtful permissioned design​

The UI emphasizes explicit selection of windows and a visible glow to indicate what’s being shared, reinforcing the session-bound, opt-in nature of Vision. That explicit consent model reduces the risk of accidental over-sharing and is a design best practice for screen-vision features.

Faster iteration via Store-distributed app packages​

Delivering Copilot feature updates through the Microsoft Store (app packages like 1.25103.107) decouples Copilot’s evolution from OS servicing cycles. This enables faster feature turnover and more rapid fixes based on Insider feedback. It’s a pragmatic engineering choice that benefits both testers and Microsoft’s ability to iterate.

Risks, gaps, and enterprise concerns​

Missing Highlights reduces effectiveness for some workflows​

The absence of visual Highlights in the initial text-mode release reduces the assistant’s ability to point at UI elements during guided tasks. For training or step-by-step troubleshooting, Highlights are a high-value feature; their absence makes some use-cases less practical. Microsoft is iterating on how to integrate visual cues with typed text, but until Highlight parity is restored the experience is asymmetric.

Rollout fragmentation and support complexity​

Staged server-side gating and channel/regional entitlements mean Insiders will see inconsistent behavior. For IT teams piloting Copilot Vision, this fragmentation complicates testing and user training because features won’t be uniformly available across a device fleet. Plan pilots with version checks and targeted user groups.

Privacy and governance questions for enterprises​

Vision’s power — reading on-screen content, extracting text and structure, and turning that content into actionable outputs — raises several governance issues:
  • Data handling and residency: Is visual data ephemeral or routed through cloud services? Microsoft’s design for Vision is permissioned and session-bound, but the processing model (local vs. cloud) can vary depending on device entitlements (Copilot+ PCs) and backend routing. Enterprises must verify where inference and storage occur for their tenant.
  • DLP and audit logs: Current preview documentation does not fully enumerate enterprise-grade DLP or auditing controls for Vision sessions. Before broad deployment in regulated environments, demand clear DLP, logging, and admin controls.
  • Consent clarity: While the UI emphasizes explicit sharing, relying on users to make correct sharing choices is brittle in complex organizational settings. Clear policies, training, and potential admin-enforced gating are necessary.

Dependence on cloud models and hardware entitlements​

Microsoft has signaled a hardware tier, Copilot+ PCs, that offloads latency-sensitive inference to local NPUs. That model improves latency and privacy for some on-device tasks, but it also creates a potential two-tier ecosystem where richer Copilot experiences are privileged on newer hardware. Organizations should weigh the trade-off between local inference performance and broader device support.

Recommendations for Windows Insiders and IT teams​

  • For Insiders and power users:
  • Update the Copilot app and check the About panel for package version 1.25103.107 or higher before testing.
  • Try Vision text sessions on non-sensitive content. Observe the UX (glow, Stop/X, modality switch) and provide feedback via the Copilot app’s Feedback option.
  • Compare text and voice sessions for the same tasks to understand where Highlights or voice coaching matter.
  • For IT teams planning pilots:
  • Start with a small pilot group and a narrow set of use-cases (e.g., support documentation extraction, guided learning).
  • Validate DLP and audit coverage for any content that might be shared with Vision.
  • Confirm processing locality (cloud vs local inference) for devices in the pilot and map that to privacy policy requirements.
  • Maintain an inventory of Copilot app package versions across devices and gate pilot participants based on confirmed availability.
  • For security and compliance teams:
  • Demand documentation from vendors that details the data flow, retention periods, and admin controls for Vision sessions.
  • Consider conditional policies that restrict Vision use on devices handling regulated data until robust DLP and logging are available.

Product strategy and where this likely goes next​

Microsoft’s approach — iterating rapidly with Insiders, decoupling Copilot updates through Store packages, and testing consent-first UX patterns — foreshadows a roadmap that will likely include:
  • Restoring visual Highlights and then designing hybrid visual/text interactions so the assistant can both say and point as needed.
  • Expanding Vision to more markets beyond initial region gating, with improved enterprise admin controls and DLP integration.
  • Refining local inference and Copilot+ PC entitlements to balance performance, privacy, and ubiquity across the Windows installed base.
This staged approach reduces risk and gives Microsoft space to harden privacy and governance controls before pushing to mainstream channels.

Final assessment​

The text-in / text-out rollout is an important, pragmatic step for Copilot Vision. It converts a voice-centric preview into a genuinely multimodal feature that respects real-world usage patterns — quiet offices, meetings, accessibility needs, and personal preference. The UI’s explicit permission model and the ability to switch modalities mid-session are well-considered design choices that increase the feature’s utility.
At the same time, the preview exposes real gaps: missing Highlights, staging fragmentation, and unanswered enterprise governance questions. These are not fatal flaws but are precisely the issues a staged Insider preview should surface. Organizations should pilot cautiously, verify DLP and telemetry, and expect Microsoft to iterate based on Insider feedback.
For Windows Insiders, this is the right time to test typed Vision flows on non-sensitive content and provide focused feedback. For IT leaders, it’s time to inventory likely use-cases, align pilots with compliance requirements, and demand clarity on data residency and auditability before enabling Vision across production fleets.
The update is rolling now to Insiders via the Microsoft Store (Copilot app version 1.25103.107+), and the early design choices suggest Microsoft plans to make Vision a flexible, multi-input assistant — provided the company resolves the remaining parity, privacy, and governance issues uncovered in this preview.

Source: Thurrott.com Copilot Vision with Text Input/Output is Rolling Out to All Insider Channels
 

Microsoft has quietly upgraded Copilot on Windows with a major usability shift: Vision now supports text-in, text-out, turning what began as a voice-first screen‑sharing experiment into a truly multimodal assistant that can see your apps and answer typed questions directly in the Copilot chat pane. The change is being distributed as an update to the Copilot app through the Microsoft Store (package version 1.25103.107 and higher) and is rolling out to Windows Insider channels in stages, with Microsoft positioning typed Vision sessions as a permissioned, session‑bound way to share only the windows or apps you choose.

Windows-style desktop with a Copilot chat overlay over a File Explorer window.Background and overview​

Copilot’s evolution on Windows has accelerated over the past year as Microsoft pivots from experimental features to system-level AI services integrated across Windows 11. Vision — the capability that lets Copilot “see” a shared app, document, or desktop region — originally launched as a voice-first experience that coached users aloud through analysis of on‑screen content. That early design emphasized hands‑free interaction and audible coaching, but it left gaps for scenarios where speaking out loud is impractical: meetings, public spaces, quiet offices, and accessibility contexts where typed responses are preferred.
The text-in/text-out update answers that gap by making typed prompts a first-class modality inside Vision sessions. Instead of being forced into an audible dialogue, users can now share a window or screen and type questions or commands; Copilot replies in text within the same conversation pane. Importantly, the session can still switch modes midstream — tapping the microphone converts a typed Vision session into voice Vision without losing conversational context. Microsoft describes the update as staged and preview-only for Insiders while it gathers feedback.

What this update delivers — the practical details​

How to start a Vision text session​

Microsoft documented a simple composer-driven flow for text Vision:
  • Open the Copilot composer in the Copilot app.
  • Click the glasses icon (Vision) in the composer.
  • Toggle off Start with voice to begin with typed input.
  • Select the app window or your desktop region to share — Microsoft’s UI shows a glow around the selected window.
  • Type your question into the chat composer; Copilot replies in text inside the same pane.
  • Press the mic icon at any time to transition to a voice‑first session and continue the same conversation.
This UI makes modality explicit and reversible: a visible “glow” confirms which window is being shared, and clearly labeled stop controls (Stop or X) end permissioned sharing.

Versioning and distribution​

Microsoft is shipping the capability as part of the Copilot app update (version 1.25103.107 and later) and is delivering it through the Microsoft Store to all Windows Insider channels as a staged rollout. That means not every Insider will see the feature immediately; Microsoft expects to iterate based on early feedback from preview users.

What Copilot can see and process​

When you share a window or desktop with Vision, Copilot analyzes visible content and — in certain Microsoft apps like Word, Excel, and PowerPoint — can use app context beyond the single visible view. In practice that means Copilot can review an entire PowerPoint deck or spreadsheet contextually without forcing the user to manually flip through every slide or cell when asked for a high-level analysis. Microsoft has repeatedly emphasized that Vision sessions are explicitly permissioned and session-bound: Copilot only sees what you explicitly share for that session.

What’s not included (the limitations in this preview)​

Microsoft carefully framed this initial text Vision release as limited in functionality compared with earlier voice Vision experiments.
  • No Highlights overlays for text Vision. Visual “Highlights” — the overlays that point to specific UI elements on‑screen — are not supported when you run Vision in text mode today. Microsoft says it is exploring how to integrate visual cues into typed conversations without compromising clarity or privacy. The absence of Highlights can make tasks that depend on precise visual pointing (for example, following which button to press inside a nested dialog) less intuitive in the text flow.
  • Staged availability. The feature is rolling out in phases to Insiders, so behavior and availability will vary by channel and device while Microsoft collects feedback. Expect a slow ramp rather than a broad immediate release.
  • Enterprise policy visibility. Microsoft documents the session-bound permission model, but enterprise administrators and security teams will need to verify how Vision sessions interact with existing data loss prevention (DLP) policies and endpoint security controls in sealed environments. Microsoft’s blog posts describe the permission model but do not enumerate every enterprise control in this early preview. That leaves a compliance gap for regulated organizations until administrators get detailed deployment guidance.
These omissions are not just feature debts; they shape how usable Vision is in real-world workflows and where Microsoft will need to focus next.

Why this matters: usability, accessibility, and workflow impact​

Accessibility gains​

Making Vision usable with typed input is a meaningful win for accessibility and situational usability. Not everyone can or wants to speak aloud to their PC, and typed interactions expand Copilot’s utility for deaf or hard-of-hearing users, people who prefer screen readers, and professionals in shared or quiet spaces. Microsoft has been enhancing Narrator and image description features in Windows 11 to improve screen-reader experiences, and text Vision aligns with that trend by enabling non‑audible access to visual context.

Productivity and context-aware assistance​

Text Vision broadens the class of tasks Copilot can help with:
  • Reviewing documents and slides without speaking aloud.
  • Asking pointed questions about a screenshot, UI element, or application state.
  • Getting inline, copyable text responses that can be pasted into emails, tickets, or bug reports.
  • Switching seamlessly to voice when a spoken walkthrough becomes necessary.
For knowledge workers, the ability to request a short, textual summary of a complex spreadsheet or a suggested rewrite of a highlighted portion of a document — without dictation or a phone headset — is an immediate productivity win.

The technical and privacy model: permissioned sharing, but questions remain​

Microsoft’s messaging emphasizes session permissioning: Vision only sees what a user explicitly shares during a session. That is an important design principle because it reduces the risk of accidental data exposure. The staged rollout also gives Microsoft time to refine UI affordances that make sharing choices crystal clear.
But permissioned sharing does not eliminate risk. The critical questions for IT teams and privacy officers include:
  • Where does shared content travel? Copilot processes content in Microsoft’s cloud services; depending on settings, parts of that interaction may be used for model improvement unless administrators or users opt out of training data collection. Microsoft exposes controls to prevent data being used for training, but the default experience and enterprise defaults should be explicitly verified before broad deployment.
  • How do DLP and endpoint protections intercept or block Vision sessions? Enterprises will want to ensure that Vision sharing can be blocked or audited under corporate DLP rules, and that there is transparency about what windows/apps are being shared. Microsoft’s early blog posts sketch the permissioned approach, but definitive enterprise controls and telemetry options must be validated in later releases.
  • Local processing vs. cloud processing balance. Microsoft has been pushing local, on‑device AI for several accessibility features (e.g., richer Narrator image descriptions on Copilot+ PCs) to keep sensitive content on device, but Vision’s heavier analysis appears designed primarily for cloud computation where models are more capable. Organizations that insist on full local control should watch the deployment notes closely.
Until Microsoft provides a full enterprise controls matrix for Vision, risk‑conscious organizations should treat the feature as experimental and control access through Insider and staged rollout policies.

Strengths and notable improvements​

  • Multimodal parity: Text Vision closes an obvious gap between voice and text interactions, making Copilot genuinely multimodal. The ability to switch modalities mid-session preserves conversational context and reduces friction in mixed workflows.
  • Faster adoption potential: Typed responses produce copyable output that’s easier to integrate into business workflows (emails, tickets, reports), accelerating real-world adoption where spoken output is less convenient.
  • Clear permission model: Microsoft’s visible “glow” selection and Stop controls surface what’s being shared, which helps users make safer decisions about exposing content during a session.
  • Alignment with accessibility roadmap: The move complements broader Windows accessibility investments — richer image descriptions in Narrator, Click to Do actions, and voice access — reinforcing Microsoft’s commitment to non‑visual and non‑auditory interaction models.

Risks, edge cases and operational concerns​

  • Ambiguity without Highlights. The lack of visual overlays in text Vision hampers scenarios that require precise pointing and step‑by‑step guidance. For example, instructing a user to click a small checkbox inside a nested UI is harder when the assistant can’t visually highlight the exact element within the typed conversation. Microsoft acknowledges this gap and is evaluating how to integrate visual cues into text flows.
  • Privacy and training-use defaults. While users can opt out of having content used to train models, the default settings and enterprise policies need active review. Organizations handling sensitive data should verify training opt-out settings across endpoints and ensure that Copilot privacy controls align with compliance needs.
  • Inconsistent hardware experiences. Microsoft’s broader Copilot+ strategy has resulted in feature disparities across hardware families (e.g., Snapdragon‑based Copilot+ PCs getting some local AI capabilities earlier). That divergence can create an uneven experience for users depending on their hardware vendor and chipset, especially for on‑device vs. cloud processing tradeoffs.
  • Auditability & forensics. Enterprises will want logs and telemetry for Vision sessions: who shared what, when, and whether that content was used beyond the session. Until Microsoft publishes comprehensive telemetry and audit guidance for Copilot Vision, security teams should treat the feature as requiring conservative governance.

Real-world use cases where text Vision excels​

  • Customer support and triage: Agents can share a problematic window and type targeted questions to extract configuration settings or error text that Copilot can summarize into issue tickets.
  • Document reviews: Reviewers can ask Copilot to summarize a slide deck or propose editing suggestions without voice, preserving quiet meeting decorum.
  • Accessibility workflows: Users relying on screen readers or who are non‑verbal can type and receive text replies that integrate with assistive technologies.
  • Software troubleshooting: Developers and QA can share a failing app window and request parsed stack traces, log snippets, or UI element identifications — then paste Copilot’s textual guidance into bug reports or chat threads.
These concrete scenarios highlight why typed Vision is more than a convenience feature; it unlocks new, practical integrations into daily work where voice is inappropriate.

Deployment checklist for IT and security teams​

  • Pilot scope: Start by restricting Vision to controlled Insider rings or a small pilot group that can provide feedback on UX, DLP interaction, and privacy controls.
  • Privacy settings review: Confirm whether Copilot is allowed to send user data for model training and set an enterprise‑wide default if needed. Document opt-out procedures for users.
  • DLP and telemetry validation: Test how existing DLP tools detect and block unwanted sharing during Vision sessions. Evaluate logging and audit trails for compliance.
  • User training: Create short training that teaches users how to use the Vision composer, the meaning of the glowing window indicator, and the steps to stop sharing. Emphasize privacy and data handling best practices.
  • Hardware variance mapping: Identify which devices in the fleet are Copilot+ capable or have optimized local AI features and note the differences in expected behavior for those machines.
This checklist is intentionally conservative: it treats Vision as a new interaction surface that must be governed before broad enterprise adoption.

What to watch next — where Microsoft should invest​

  • Integrated visual cues for text flows. Bringing Highlights or equivalent visual pointers into text sessions is the single most important UX improvement for making text Vision a practical replacement for voice in detailed workflows. Microsoft has called this out as an area of iteration.
  • Enterprise-grade controls and visibility. Detailed guidance for admins on telemetry, DLP integration, and model‑training opt‑outs will determine whether enterprises feel comfortable enabling Vision widely.
  • Local vs. cloud processing choices. Expanding on‑device, private inference capabilities — especially for sensitive corporate data — will broaden Vision’s reach in regulated environments and make the feature more attractive to privacy‑conscious organizations.
  • Consistency across hardware. Narrowing the function gap between Copilot+ devices and mainstream Intel/AMD machines will limit user confusion and make cross‑device collaboration smoother.

Final analysis: measured optimism​

Microsoft’s decision to add text-in, text-out to Copilot Vision is a pragmatic, user-centered improvement that materially increases Copilot’s usefulness in everyday work. It reduces the friction imposed by voice-only flows, brings accessibility and situational usability gains, and produces copyable, integrable textual outputs that fit business workflows.
At the same time, the rollout is an early preview with clear functional limits (notably the lack of Highlights), staged availability, and a need for clearer enterprise controls and telemetry. For organizations, the prudent path is to pilot the feature in controlled environments, validate privacy and DLP implications, and prepare to govern its use while Microsoft iterates.
In short: Vision has become more flexible, and that is strategically important for Copilot’s road to mainstream adoption — but the details of privacy, pointing‑style UI, and enterprise governance will determine whether it becomes widely trusted and useful in production environments.

Microsoft’s staged release through the Microsoft Store makes text Vision easy to try for Windows Insiders, and the change signals that Microsoft intends to make multimodal, permissioned AI a normal part of desktop workflows. Watch for subsequent Copilot app updates that restore visual highlights in typed sessions and for formal guidance on enterprise deployment; those will be the turning points that determine how quickly Vision moves from preview curiosity to everyday tool.

Source: newskarnataka.com https://newskarnataka.com/technolog...-text-in-text-out-for-windows-users/30102025/
 

Back
Top