Copilot Usage Report 2025: Device and Time Shape AI Companions

  • Thread Author
Microsoft’s own data now frames a simple, consequential idea: Copilot is not a single-use tool but a device-shaped companion. After analyzing a sample of 37.5 million de-identified Copilot conversations from January through September 2025, Microsoft’s AI research team reports a striking split in behavior—mobile users lean heavily on Copilot for health and personal advice at every hour, while desktop users engage Copilot as a daytime workmate centered on technology and career tasks. The analysis, released as a preprint titled It’s About Time: The Copilot Usage Report 2025, also highlights predictable time-of-day and seasonal rhythms (programming spikes during weekdays, gaming on weekends, a Valentine’s Day relationship surge) and documents a growing appetite for advice-seeking rather than simple information retrieval.

Background​

Microsoft’s Copilot ecosystem has expanded rapidly into consumer and productivity surfaces across Windows, Edge, and mobile apps. As Copilot became embedded in search, communication, and task flows, the company’s research arm sought to understand not just what people ask, but when and on what device they ask it. The new Copilot Usage Report samples tens of millions of consumer conversations and classifies them automatically by topic (e.g., Health and Fitness, Technology, Work and Career, Religion and Philosophy) and intent (e.g., information-seeking, advice-seeking, content creation). Enterprise- and school-authenticated traffic were explicitly excluded, meaning the dataset captures consumer patterns on personal devices rather than workplace or education accounts. This shift from narrow feature telemetry to human-centered behavior research is important. It reframes Copilot as a social technology that sits in users’ daily lives—sometimes as a productivity tool, sometimes as a confidant. Understanding this usage matrix is essential for product design, safety engineering, and policy choices.

What Microsoft Found: High-level takeaways​

Three interaction modes​

Microsoft frames the findings through three high-level modes:
  • The workday — desktop-centric, business-hour activity dominated by work, technology, and education topics.
  • The constant personal companion — mobile-centric, where health and fitness dominate across all hours.
  • The introspective night — a late-night uptick in personal, philosophical, and reflective topics such as religion and philosophy.

Device is destiny​

The clearest headline is the device split. On mobile, the single most common topic-intent pairing was Health and Fitness combined with information-seeking — and critically, that dominance persisted across every hour and every month of the sampled period. On desktop, Technology and Work and Career dominate, with work-related queries overtaking technology during typical business hours (roughly 8 a.m. to 5 p.m.. These patterns strongly suggest a contextual affordance: phones are treated as private, always-available advisors; desktops are treated like collaborative workspaces.

Time and seasonality matter​

The sample also surfaces temporal rhythms:
  • Weekdays skew toward programming and productivity topics.
  • Weekends show higher relative volume for gaming and entertainment.
  • Late-night hours show increases in religion, philosophy, and introspective topics.
  • Cultural/seasonal spikes appear: notably, relationship conversations peaked on Valentine’s Day, with a preceding rise in personal growth and wellness queries.

Advice is growing​

Beyond topical shifts, Microsoft reports a rise in advice-seeking as an intent label. Users increasingly ask Copilot not just to fetch facts but to weigh options, interpret nuance, and offer next steps. That trend places different demands on response grounding, provenance, and safety.

Methodology: What Microsoft did — and what it didn’t​

The dataset and labeling​

The paper analyzes a sample of 37.5 million de-identified conversations, covering January–September 2025. Conversations were automatically processed: personally identifiable information was stripped, a summary of the conversation was extracted, and machine classifiers labeled each conversation by topic and intent. Microsoft emphasizes de-identification and automated labeling as privacy-preserving choices.

Important caveats the report itself notes​

  • The report is a preprint and has not undergone independent peer review.
  • The analysis excludes enterprise and school accounts, so it does not describe how Copilot is used under Microsoft 365 / workplace authentication.
  • All topic and intent labels were produced by automated classifiers; there was no human review of the underlying messages. This means conclusions reflect classifier groupings rather than human-coded categories, and classifier biases or labeling errors would shape the findings.

What the public coverage confirms​

Independent reporting (GeekWire, Search Engine Journal and others) aligns with Microsoft’s summaries on the core claims—the device split, the dominance of mobile health queries, the daytime work orientation on desktop, and the growing advice-seeking behavior. Multiple news outlets highlight that Microsoft AI CEO Mustafa Suleyman is named among research contributors and that Microsoft positioned the findings as an argument for differentiated UX (desktop = information density; mobile = brevity and empathy).

Critical analysis: strengths of the report​

Scale and product grounding​

The sample size—37.5 million conversations—is large and product-grounded. That scale allows Microsoft to detect patterns across device classes, hours of day, and seasonal events with statistical power that smaller surveys can’t match. The fact that these are real-world consumer interactions (albeit de-identified) gives the findings strong ecological validity for consumer Copilot behavior.

Actionable design implications​

Microsoft translates the descriptive findings into prescriptive design ideas: a desktop Copilot should favor high information density and workflow execution, while a mobile Copilot should emphasize empathy, brevity, and personal guidance. That linkage from observed behavior to UX strategy is a pragmatic strength: it ties measurement to product choices.

Privacy-forward processing (claimed)​

Microsoft reports automated de-identification and summary-extraction processes intended to minimize exposure of personal data during analysis. If implemented robustly, this approach reduces privacy risk compared with human review. The public emphasis on automation and no human review responds to legitimate privacy concerns.

Critical analysis: limitations, risks, and open questions​

Heavy reliance on automated classifiers​

The entire taxonomy of topics and intents depends on machine classifiers. But the report provides limited public detail about classifier design, training data, validation metrics (precision/recall), and error rates. Without those metrics, it’s hard to gauge how often a conversation about a health symptom was labeled correctly versus misclassified into another category—or how consistently advice-seeking is identified versus information-seeking. This is a substantial caveat: labeling errors can produce spurious patterns or exaggerate trends. Where classifier performance statistics are absent or limited, treat granular claims with caution.

The preprint and lack of peer review​

As a preprint, the report hasn’t gone through peer review. That doesn’t invalidate its observations, but it does mean independent researchers haven’t yet stress-tested the methodology, replication strategy, or potential biases. Claims about persistence across all hours or months hinge on model outputs and sampling decisions that require open scrutiny.

Excluding enterprise and education users biases the picture​

By design the dataset excludes enterprise-authenticated traffic. This makes the report a strong statement about consumer Copilot behavior but leaves a gap in understanding how Copilot is used in workplaces—where integration with Microsoft 365 and enterprise data could produce very different patterns and risks. For organizations assessing Copilot’s workplace impact, the report should not be taken as representative.

Health queries and the problem of advice​

The most consequential single finding is that health dominates mobile use. Users are turning to Copilot for health information and advice, often outside normal clinical oversight. That raises several red flags:
  • Misinformation risk: Hallucination or partial answers can mislead users about symptoms, medication, or treatment options.
  • Liability and trust: Users may grant conversational suggestions higher trust than warranted; Copilot responses lack clinician confidentiality and legal protections of medical advice.
  • Safety engineering gap: The report references higher-level controls but provides little public detail about how Copilot detects, escalates, or constraints health-related conversations (for example, directing a user to seek clinical help, surfacing provenance, or limiting actionable medical directives).

Emotional attachment, advice-seeking, and the “confidant” problem​

Microsoft’s own framing—calling Copilot a “vital companion” in some write-ups—acknowledges the social role these systems now play. But social attachment to a conversational agent can produce harmful outcomes when the agent’s responses are flawed, biased, or manipulative. The research signals rising advice-seeking intent, which raises ethical and safety questions that automated content filters alone cannot solve.

What the findings mean for product teams and engineers​

Design differentiation by device is now evidence-based​

The simple implication for product teams: optimize Copilot interactions differently for mobile and desktop.
  • Desktop: prioritize structured workflows, data-extraction, multiturn reasoning, and integration with file and app contexts.
  • Mobile: prioritize brevity, empathy, clarifying questions, and safe fallback options for sensitive domains like health.

Grounding and provenance must be first-class​

Advice-seeking requires transparent grounding. When Copilot gives health-related recommendations or interpretations, responses should:
  • State confidence and provenance.
  • Prefer non-directive phrasing for medical or legal topics.
  • Route high-risk queries to certified resources or encourage professional consultation.
  • Log and allow users to view decision trails for agentic actions.

Differential guardrails and friction​

A one-size-fits-all safety posture won’t suffice. Consider device-aware guardrails:
  • Stronger friction and explicit consent for mobile health advice (confirming user intent, providing disclaimers).
  • Workflow confirmations for agentic desktop operations (file edits, calendar changes).
  • Privacy nudges and memory controls when Copilot is used for intimate personal topics.

Risks for users and industry — and how to mitigate them​

Risk: Overreliance on machine advice​

Mitigation:
  • Encourage copilot outputs as starting points rather than definitive answers.
  • Add explicit “verify with a professional” signals for high-stakes outputs.
  • Enable easy access to provenance and sources.

Risk: Misclassification and labelling bias​

Mitigation:
  • Publish classifier performance metrics and confusion matrices for the major topic and intent labels.
  • Provide external researchers with synthetic or aggregated slices (privacy-safe) to replicate findings.

Risk: Privacy and retention​

Mitigation:
  • Continue de-identification but publish the de-identification approach and privacy audit outcomes.
  • Offer users clear memory controls and audit logs for data that Copilot retains.

Risk: Regulatory and legal exposure​

Mitigation:
  • Define explicit user-facing terms for health, legal, and financial advice with safe fallback flows.
  • Implement human escalation paths for flagged high-risk queries.

What Microsoft (and competitors) should publish next​

Transparency and reproducibility would go a long way to make this useful for researchers, regulators, and enterprise customers. Practical next steps the vendor community should take include:
  • Publish classifier validation metrics for topic/intent labels (precision, recall, F1 scores).
  • Release an anonymized, privacy-preserving sample (or synthetic dataset) and methodology appendix to enable independent validation.
  • Detail the safety flows for high-risk categories (health, legal, self-harm), including escalation, provenance, and rate-limiting.
  • Clarify the selection and sampling method used to build the 37.5M sample (sampling frequency, cross-device linking, de-duplication).
  • Commission independent audits on the de-identification pipeline and on the impact of label errors on major findings.
These steps would convert a strong internal dataset into a broader public resource that can be relied on for policy and product decisions.

Practical takeaways for Windows and Copilot users​

  • Treat Copilot’s health advice as assistive, not authoritative. Confirm medical guidance with clinicians and verified medical resources.
  • Use Copilot memory and consent controls to compartmentalize personal topics if you prefer separation between work and private life.
  • For workplace use, don’t assume consumer Copilot patterns match your enterprise usage. Enterprise traffic was excluded from the study; workplace dynamics and data contexts can create different patterns and risks.

The broader significance: technology mirrors human context​

The most important conceptual shift from this research is simple but consequential: AI assistant behavior is context-sensitive not only because of people, but because of devices and time. People adopt the same core model for different social roles—colleague at the desk, confidant in the pocket. That duality matters for UX, for safety engineering, and for regulatory thinking. It also raises an ethical question for designers: when a system becomes both a productivity tool and a personal confidant, how should it negotiate competing expectations of utility, privacy, and emotional care?
Microsoft’s report supplies a large-scale empirical basis for that debate, but it also leaves key verification questions open. The absence of peer review and limited transparency on classifier performance mean the most load-bearing claims—especially the ubiquity of mobile health queries—should be treated as strong signals rather than settled facts pending further scrutiny.

Conclusion​

The Copilot Usage Report 2025 reframes how engineers and product teams should think about conversational agents. The device- and time-driven patterns documented by Microsoft suggest that generic, one-size-fits-all interactions are suboptimal and potentially dangerous for high-stakes domains like health. The report’s scale gives weight to its claims, but the reliance on automated labeling and the preprint status demand cautious interpretation. For end users and enterprises alike, the lesson is practical: treat conversational AI outputs as contextual aids, insist on provenance and guardrails, and press vendors for transparent classifier metrics and safety flows before delegating decisions to an assistant that can feel, to users, like a trusted companion.
Source: Search Engine Journal How People Use Copilot Depends On Device, Microsoft Says