Microsoft has begun rolling out a staged Insider preview that finally gives Copilot Vision a full text-in / text-out conversation path so Windows Insiders can type about what Copilot sees and receive text replies in the same chat pane. 
		
		
	
	
Copilot on Windows has been evolving from a sidebar curiosity into a system-level assistant with three interlocking modalities: Voice, Vision, and Actions. Vision initially launched as a voice-first, coached experience that allowed Copilot to analyze a shared window, speak guidance aloud, and in some previews visually highlight UI elements on screen. The new update introduces typed conversations for Vision sessions, broadening where and how the feature can be used. 
Microsoft’s official Windows Insider announcement confirms the new capability and specifies the Copilot app package delivering it: Copilot app version 1.25103.107 and higher is being distributed via the Microsoft Store to all Insider channels as a staged rollout.
At the same time, the preview exposes real gaps: missing Highlights, staging fragmentation, and unanswered enterprise governance questions. These are not fatal flaws but are precisely the issues a staged Insider preview should surface. Organizations should pilot cautiously, verify DLP and telemetry, and expect Microsoft to iterate based on Insider feedback.
For Windows Insiders, this is the right time to test typed Vision flows on non-sensitive content and provide focused feedback. For IT leaders, it’s time to inventory likely use-cases, align pilots with compliance requirements, and demand clarity on data residency and auditability before enabling Vision across production fleets.
The update is rolling now to Insiders via the Microsoft Store (Copilot app version 1.25103.107+), and the early design choices suggest Microsoft plans to make Vision a flexible, multi-input assistant — provided the company resolves the remaining parity, privacy, and governance issues uncovered in this preview.
Source: Thurrott.com Copilot Vision with Text Input/Output is Rolling Out to All Insider Channels
				
			
		
		
	
	
 Background
Background
Copilot on Windows has been evolving from a sidebar curiosity into a system-level assistant with three interlocking modalities: Voice, Vision, and Actions. Vision initially launched as a voice-first, coached experience that allowed Copilot to analyze a shared window, speak guidance aloud, and in some previews visually highlight UI elements on screen. The new update introduces typed conversations for Vision sessions, broadening where and how the feature can be used. Microsoft’s official Windows Insider announcement confirms the new capability and specifies the Copilot app package delivering it: Copilot app version 1.25103.107 and higher is being distributed via the Microsoft Store to all Insider channels as a staged rollout.
What changed: the headline features
- Text-in / text-out Vision — Start a Vision session by sharing an app or screen, type your questions in the Copilot chat composer, and receive text replies in the same conversation pane rather than spoken feedback.
- Seamless modality switching — A live session can flip from typed text to voice: pressing the microphone transitions the conversation from text Vision into voice Vision without losing context.
- Explicit sharing controls and UI feedback — The composer’s glasses (Vision) icon begins the flow; a visible glow confirms which window is being shared; Stop or X ends sharing. The feature intentionally uses permissioned, session-bound sharing so Copilot only sees what the user explicitly shares.
How to try it (Insider steps)
- Update the Copilot app via the Microsoft Store and confirm the Copilot app version is 1.25103.107 or higher.
- Open the Copilot composer in the Copilot app (or from the taskbar quick view).
- Click the Vision (glasses) icon.
- Toggle off Start with voice to enter the text-in / text-out mode.
- Select the app window or screen region to share; the shared area will glow to indicate it’s being shared.
- Type questions and read Copilot’s textual replies in the same chat pane. Press the Voice (microphone) icon to switch to spoken interaction; use Stop (X) to end sharing.
Why this matters: practical benefits
- Usability in quiet or public spaces. Voice-first interactions are awkward in meetings, shared offices, or on public transit; typed Vision preserves visual context while maintaining discretion.
- Accessibility and preference. Users with speech impairments or those who prefer typing now get parity of experience when asking about screen content.
- Productivity continuity. Typing keeps a searchable record directly in the chat pane and enables copy/paste/export flows more naturally than transient voice replies.
- Flexible workflows. The ability to switch modalities mid-session means users can begin a text-based troubleshooting dialog and switch to voice for a hands-free follow-up without losing context.
What’s not included (preview limitations and conservatism)
Microsoft is explicit that some previously demonstrated features are not available in the initial text-mode preview:- Visual Highlights — the overlays that visually point to UI elements on screen are not supported in this release. Microsoft says it is exploring how visual cues should integrate with typed conversation flows.
- Staged availability — this is a server‑side gated rollout; not every Insider or region will see the update simultaneously. Expect uneven availability across channels and devices.
Technical verification: facts, numbers, and claims checked
- The Copilot app package version tied to this preview is 1.25103.107 (minimum). This is confirmed directly by Microsoft’s Windows Insider blog and was repeated by independent reporting.
- The rollout announcement date and preview details were published by Microsoft in an Insider blog post dated October 28, 2025; the same post describes the toggle flow, the visual glow indicator, and the lack of Highlights in the release. These items appear in Microsoft’s post and in contemporaneous tech coverage.
- The share flow and UI affordances (glasses icon, Start with voice toggle, mic button, Stop/X control) are described consistently across Microsoft’s blogs and independent coverage, corroborating the user flow described in the update.
Deep analysis: strengths and momentum
Multimodality is the correct next step
The shift to true multimodal interactions — where keyboard, voice, and visual context are equal first-class inputs — aligns with how people actually work. Many tasks require a mix of modalities: glance at a screen, type a focused question, then talk through next steps. The new text-in Vision acknowledges that voice cannot be the only or default path for visual queries, which is an important UX maturation.Thoughtful permissioned design
The UI emphasizes explicit selection of windows and a visible glow to indicate what’s being shared, reinforcing the session-bound, opt-in nature of Vision. That explicit consent model reduces the risk of accidental over-sharing and is a design best practice for screen-vision features.Faster iteration via Store-distributed app packages
Delivering Copilot feature updates through the Microsoft Store (app packages like 1.25103.107) decouples Copilot’s evolution from OS servicing cycles. This enables faster feature turnover and more rapid fixes based on Insider feedback. It’s a pragmatic engineering choice that benefits both testers and Microsoft’s ability to iterate.Risks, gaps, and enterprise concerns
Missing Highlights reduces effectiveness for some workflows
The absence of visual Highlights in the initial text-mode release reduces the assistant’s ability to point at UI elements during guided tasks. For training or step-by-step troubleshooting, Highlights are a high-value feature; their absence makes some use-cases less practical. Microsoft is iterating on how to integrate visual cues with typed text, but until Highlight parity is restored the experience is asymmetric.Rollout fragmentation and support complexity
Staged server-side gating and channel/regional entitlements mean Insiders will see inconsistent behavior. For IT teams piloting Copilot Vision, this fragmentation complicates testing and user training because features won’t be uniformly available across a device fleet. Plan pilots with version checks and targeted user groups.Privacy and governance questions for enterprises
Vision’s power — reading on-screen content, extracting text and structure, and turning that content into actionable outputs — raises several governance issues:- Data handling and residency: Is visual data ephemeral or routed through cloud services? Microsoft’s design for Vision is permissioned and session-bound, but the processing model (local vs. cloud) can vary depending on device entitlements (Copilot+ PCs) and backend routing. Enterprises must verify where inference and storage occur for their tenant.
- DLP and audit logs: Current preview documentation does not fully enumerate enterprise-grade DLP or auditing controls for Vision sessions. Before broad deployment in regulated environments, demand clear DLP, logging, and admin controls.
- Consent clarity: While the UI emphasizes explicit sharing, relying on users to make correct sharing choices is brittle in complex organizational settings. Clear policies, training, and potential admin-enforced gating are necessary.
Dependence on cloud models and hardware entitlements
Microsoft has signaled a hardware tier, Copilot+ PCs, that offloads latency-sensitive inference to local NPUs. That model improves latency and privacy for some on-device tasks, but it also creates a potential two-tier ecosystem where richer Copilot experiences are privileged on newer hardware. Organizations should weigh the trade-off between local inference performance and broader device support.Recommendations for Windows Insiders and IT teams
- For Insiders and power users:
- Update the Copilot app and check the About panel for package version 1.25103.107 or higher before testing.
- Try Vision text sessions on non-sensitive content. Observe the UX (glow, Stop/X, modality switch) and provide feedback via the Copilot app’s Feedback option.
- Compare text and voice sessions for the same tasks to understand where Highlights or voice coaching matter.
- For IT teams planning pilots:
- Start with a small pilot group and a narrow set of use-cases (e.g., support documentation extraction, guided learning).
- Validate DLP and audit coverage for any content that might be shared with Vision.
- Confirm processing locality (cloud vs local inference) for devices in the pilot and map that to privacy policy requirements.
- Maintain an inventory of Copilot app package versions across devices and gate pilot participants based on confirmed availability.
- For security and compliance teams:
- Demand documentation from vendors that details the data flow, retention periods, and admin controls for Vision sessions.
- Consider conditional policies that restrict Vision use on devices handling regulated data until robust DLP and logging are available.
Product strategy and where this likely goes next
Microsoft’s approach — iterating rapidly with Insiders, decoupling Copilot updates through Store packages, and testing consent-first UX patterns — foreshadows a roadmap that will likely include:- Restoring visual Highlights and then designing hybrid visual/text interactions so the assistant can both say and point as needed.
- Expanding Vision to more markets beyond initial region gating, with improved enterprise admin controls and DLP integration.
- Refining local inference and Copilot+ PC entitlements to balance performance, privacy, and ubiquity across the Windows installed base.
Final assessment
The text-in / text-out rollout is an important, pragmatic step for Copilot Vision. It converts a voice-centric preview into a genuinely multimodal feature that respects real-world usage patterns — quiet offices, meetings, accessibility needs, and personal preference. The UI’s explicit permission model and the ability to switch modalities mid-session are well-considered design choices that increase the feature’s utility.At the same time, the preview exposes real gaps: missing Highlights, staging fragmentation, and unanswered enterprise governance questions. These are not fatal flaws but are precisely the issues a staged Insider preview should surface. Organizations should pilot cautiously, verify DLP and telemetry, and expect Microsoft to iterate based on Insider feedback.
For Windows Insiders, this is the right time to test typed Vision flows on non-sensitive content and provide focused feedback. For IT leaders, it’s time to inventory likely use-cases, align pilots with compliance requirements, and demand clarity on data residency and auditability before enabling Vision across production fleets.
The update is rolling now to Insiders via the Microsoft Store (Copilot app version 1.25103.107+), and the early design choices suggest Microsoft plans to make Vision a flexible, multi-input assistant — provided the company resolves the remaining parity, privacy, and governance issues uncovered in this preview.
Source: Thurrott.com Copilot Vision with Text Input/Output is Rolling Out to All Insider Channels
 
 
		
