Microsoft’s Copilot has moved from marketing demo to frontline paper trialled inside the largest UK welfare department — and the Department for Work and Pensions (DWP) now says the paid, licensed version of Microsoft 365 Copilot saved civil servants an average of 19 minutes per working day on routine tasks, based on a mixed-method evaluation of a six‑month pilot that ran from October 2024 to March 2025. (gov.uk)
The DWP trial tested the licensed Copilot across central office (non‑frontline) staff and distributed 3,549 licences to employees by a mix of volunteers and peer nominations. Fieldwork consisted of two large surveys, one of licence holders (1,716 responses) and one of a comparison group of non‑licence holders (2,535 responses), supplemented by 19 in‑depth interviews and econometric analysis using Seemingly Unrelated Regression (SUR). The DWP evaluation was published on 29 January 2026 and focuses on the licensed, tenant‑integrated Copilot available to Microsoft 365 customers. (gov.uk)
This official DWP estimate sits between other high‑profile government figures: a cross‑government Government Digital Service (GDS) experiment involving roughly 20,000 civil servants reported an average 26 minutes saved per day, published as a ministerial statement on 2 June 2025, while a departmental pilot run by the Department for Business and Trade (DBT) concluded that Copilot delivered mixed effects and did not show clear, department‑level productivity gains overall. (thegovernmentsays-files.s3.amazonaws.com)
That figure is credible — but not definitive — and should be interpreted alongside the larger GDS headline of 26 minutes (which lacked a control group) and DBT’s more cautious, task‑specific findings that time savings do not automatically equal department‑level productivity improvements. Taken together, the evidence paints a consistent picture: Copilot helps with text and knowledge work, improves draft quality and staff experience in many roles, but is not a silver bullet and requires careful governance, training, and measurement to turn minutes saved into durable value. (thegovernmentsays-files.s3.amazonaws.com)
For IT and digital leaders, the path is clear: pilot with rigor, govern tightly, measure comprehensively, and scale where task fit, cost structure, and verification workflows align. In short: treat Copilot as a powerful productivity‑adjacent tool — and design policy and process so those saved minutes translate into better public service, not just faster drafts. (gov.uk)
Source: theregister.com DWP finds Copilot saves civil servants 19 minutes a day
Background / Overview
The DWP trial tested the licensed Copilot across central office (non‑frontline) staff and distributed 3,549 licences to employees by a mix of volunteers and peer nominations. Fieldwork consisted of two large surveys, one of licence holders (1,716 responses) and one of a comparison group of non‑licence holders (2,535 responses), supplemented by 19 in‑depth interviews and econometric analysis using Seemingly Unrelated Regression (SUR). The DWP evaluation was published on 29 January 2026 and focuses on the licensed, tenant‑integrated Copilot available to Microsoft 365 customers. (gov.uk)This official DWP estimate sits between other high‑profile government figures: a cross‑government Government Digital Service (GDS) experiment involving roughly 20,000 civil servants reported an average 26 minutes saved per day, published as a ministerial statement on 2 June 2025, while a departmental pilot run by the Department for Business and Trade (DBT) concluded that Copilot delivered mixed effects and did not show clear, department‑level productivity gains overall. (thegovernmentsays-files.s3.amazonaws.com)
What the DWP evaluation actually measured
Study design and sample
- The DWP trial ran from October 2024 to March 2025 and targeted central office business functions (policy, digital, finance, etc.). Frontline Jobcentre colleagues were excluded from the licensed pilot. (gov.uk)
- The evaluation combined quantitative surveys and econometric modelling with qualitative interviews to produce a counterfactual‑style estimate: the SUR models contrasted Copilot users with a stratified comparison group of non‑users and adjusted for demographic factors, job grade, business area, health conditions, and measures of AI interest and prior experience. (gov.uk)
Outcomes and metrics
DWP measured three primary outcomes:- Task efficiency — self‑reported time saved per day across eight routine tasks, converted from ordinal survey categories into continuous minutes, then modelled with SUR.
- Job satisfaction — a 7‑point Likert measure of overall satisfaction in the last three months.
- Perceived quality of work — a 7‑point Likert measure for output quality.
Where the time savings came from — task‑level breakdown
DWP disaggregated time savings by task. The largest measured effects were for text and knowledge tasks rather than data‑heavy work:- Searching for existing information or research: 26 minutes saved per day. (gov.uk)
- Writing emails: 25 minutes saved per day. (gov.uk)
- Summarising information or research: 24 minutes saved per day. (gov.uk)
- Producing or editing written materials: ~20 minutes saved per day. (gov.uk)
- Transcribing/summarising meetings: the smallest measured saving at 9 minutes per day. (gov.uk)
How staff used the saved time
DWP’s qualitative interviews reveal how staff redeployed minutes saved:- Many respondents said the time freed up was reinvested in higher‑value tasks such as project work, strategic planning, or mentoring, rather than simply extending working hours. (gov.uk)
- Users reported improvements in the quality of draft outputs — Copilot assisted with tone, structure, and initial drafts, particularly for emails and briefings — while emphasising the need for human editing where judgement, legal accuracy, or citations are required. (gov.uk)
- Several interviewees described Copilot as a “comfort blanket” that reduced stress and cognitive load when handling paperwork and information overload. (gov.uk)
How DWP’s finding compares with other UK government trials
- The Government Digital Service (GDS) cross‑government experiment — a much larger, cross‑departmental exercise involving ≈20,000 civil servants across 12 organisations — reported a headline 26 minutes per day saved, and strong user satisfaction and adoption metrics. That experiment relied heavily on self‑reporting and did not use a formal non‑user comparison group in the same way DWP did. (thegovernmentsays-files.s3.amazonaws.com)
- The Department for Business and Trade (DBT) ran a three‑month pilot with 1,000 licences (Oct–Dec 2024) and published an evaluation on 28 August 2025 that found high user satisfaction but no robust evidence that aggregated time savings translated into measurable productivity gains for the department; some tasks sped up while others slowed because of output quality issues. DBT combined diary studies, telemetry, and observed task timings to reach a more conservative conclusion.
Methodological strengths and limitations — what to believe
Strengths of the DWP evaluation
- Comparison group: unlike some other government studies, DWP explicitly surveyed a stratified comparison group of non‑licence holders, improving causal inference potential. (gov.uk)
- Econometric adjustment: the SUR modelling adjusted for a wide set of covariates including job grade, business area and measures of AI interest/experience, which reduced some self‑selection bias. (gov.uk)
- Mixed methods: combining surveys, interviews, and regression analysis provides both quantification and contextual understanding about how Copilot was used and perceived. (gov.uk)
Important caveats and risks
- Non‑random allocation: licences were distributed via volunteers and nominations, not random assignment, leaving potential for unobserved confounding (people who volunteer for technology pilots tend to be different). DWP acknowledges this and attempts to adjust, but limitations remain. (gov.uk)
- Self‑reported time data: converting ordinal diary responses into continuous minutes risks over‑ or under‑estimating real elapsed time; observed task timing studies often tell a different story than diaries. DBT explicitly flagged the inflationary potential of self‑report measures.
- No pre‑trial baseline: absence of a robust pre‑trial measurement complicates claims about net gains relative to prior working patterns. The DWP report used cross‑sectional comparisons instead. (gov.uk)
- Task heterogeneity: Copilot helps some tasks (summaries, search, drafting) much more than others (complex Excel analyses, novel tasks) — blanket productivity claims therefore overstate nuance. DBT’s pilot found Copilot slowed data‑analysis tasks in places.
- Hallucinations and trust: confident but incorrect outputs (“hallucinations”) remain a real hazard in public sector use, especially where incorrect content could be passed to citizens or used in decision documents. Both DBT and other departments reported hallucination incidents requiring editorial oversight.
Governance, security and training — the operational checklist
Adopting Copilot at scale is not just a procurement question; it’s an organisational transformation problem that touches security, procurement, policy, and professional practice. Key governance considerations surfaced in the DWP report and other departmental evaluations:- Data protection and acceptable use: explicit policies are needed on what departmental data can be fed into Copilot, with role‑based controls and tenant settings to prevent leakage of sensitive information. (gov.uk)
- Verification workflows: build mandatory human‑in‑the‑loop checks for outputs used in official communications, legal texts, or published decisions. Automated outputs should be treated as drafts rather than final work. (gov.uk)
- Training that is role‑specific: DWP respondents wanted short, practical sessions tailored to the specific tasks they do, not generic demos. Targeted prompts and playbooks are more effective than one‑size‑fits‑all onboarding. (gov.uk)
- Telemetry and ROI measurement: pair self‑reported diaries with observed task timings and telemetry (application calls, action counts) to triangulate real productivity effects. DBT’s mixed approach underlined the value of multiple data streams.
- Accessibility and inclusion: Copilot delivered measurable accessibility benefits for neurodivergent staff and non‑native English speakers by reducing friction in drafting and summarisation — a compelling equity argument to complement efficiency claims. (gov.uk)
- Environmental and cost audits: departments should quantify licence costs, per‑user consumption and any environmental footprint of large model usage before large rollouts; DBT flagged the need for further environmental assessment.
Practical recommendations for IT leaders and programme owners
- Start with tightly scoped pilots that pair users with matched non‑user comparison groups and include both diary and observed timing methods.
- Prioritise text‑heavy business functions (policy drafting, comms, secretariat) where Copilot shows the clearest gains.
- Implement tenant‑level governance from day one: DLP, role controls, audit logging, and a mandatory verification policy for outputs.
- Deliver short, role‑specific training and published prompt libraries for common tasks (email drafts, search prompts, meeting minutes).
- Measure impact holistically: time saved is valuable, but watch for offset costs (rework from poor quality outputs) and track whether time savings convert to higher‑value activities.
- Maintain human oversight for regulated outputs and embed review steps in workflows rather than treating Copilot as a final author. (gov.uk)
The ROI question — can organisations expect to recoup license costs?
The DWP evidence suggests measurable, daily minutes saved for many users — but turning minutes into pounds is not automatic. Licence pricing, the proportion of staff in text‑intensive roles, the degree of managerial buy‑in, and whether time savings are redeployed to revenue‑generating or cost‑saving activities all determine ROI.- If time savings are reinvested in higher‑value tasks (policy delivery, stakeholder engagement), the organisation may see strategic returns.
- If saved minutes merely reduce stress or marginally shorten email time without changing higher‑order outputs, financial ROI will be modest.
- DBT’s cautious conclusion that time savings did not obviously translate into department‑level productivity demonstrates why finance teams should insist on evidence of reallocation to high‑value outcomes before scaling licences.
Broader implications: AI assistants in public service
Three broader lessons emerge from DWP, GDS, and DBT experiments:- Nuance beats hype: AI assistants are toolkits for specific pain points, not universal productivity multipliers. Different departments and roles will experience different net benefits. (gov.uk)
- Measurement matters: self‑reporting inflates headlines. The most credible evaluations combine self‑report, telemetry, and observed timing with comparison groups where feasible. (gov.uk)
- Governance is not optional: hallucinations, data sensitivity and quality lapses are real and can offset efficiency gains if unchecked. Robust policies and human verification must be baked into rollouts. (theregister.com)
Final analysis and verdict
DWP’s evaluation is one of the most methodologically conscientious departmental looks at Copilot to date: it balances self‑reported user experience with econometric adjustment against a comparison group to estimate an average daily saving of 19 minutes among central functions, with strongest gains on search, summarisation and email drafting. (gov.uk)That figure is credible — but not definitive — and should be interpreted alongside the larger GDS headline of 26 minutes (which lacked a control group) and DBT’s more cautious, task‑specific findings that time savings do not automatically equal department‑level productivity improvements. Taken together, the evidence paints a consistent picture: Copilot helps with text and knowledge work, improves draft quality and staff experience in many roles, but is not a silver bullet and requires careful governance, training, and measurement to turn minutes saved into durable value. (thegovernmentsays-files.s3.amazonaws.com)
For IT and digital leaders, the path is clear: pilot with rigor, govern tightly, measure comprehensively, and scale where task fit, cost structure, and verification workflows align. In short: treat Copilot as a powerful productivity‑adjacent tool — and design policy and process so those saved minutes translate into better public service, not just faster drafts. (gov.uk)
Source: theregister.com DWP finds Copilot saves civil servants 19 minutes a day
