A six‑month Department for Work and Pensions trial of Microsoft 365 Copilot — involving 3,549 central‑office staff between October 2024 and March 2025 — measured modest but measurable productivity gains, reporting an average saving of 19 minutes per user per day, alongside improvements in perceived work quality and job satisfaction. The evaluation, published by the department in late January 2026, found that Copilot was most useful for information retrieval, email drafting and document summarisation; neurodivergent users reported particular accessibility benefits; and staff generally regarded the assistant as a complement to, not a replacement for, professional judgement. The trial sits alongside larger cross‑government experiments that reported even larger time savings, and it exposes the same mix of opportunity and risk that now shapes public‑sector AI policy: clear day‑to‑day efficiency gains tempered by governance, accuracy and procurement questions that must be resolved before widescale rollout.
The DWP trial evaluated the licensed (paid) version of Microsoft 365 Copilot and focused on corporate, central‑office functions — policy, digital, finance, communications and similar roles — deliberately excluding frontline operational colleagues such as Jobcentre staff. The department allocated 3,549 licences to a mixture of volunteers and peer nominations and combined large‑scale surveys with econometric analysis and qualitative interviews to estimate effects on three primary outcomes: task efficiency, job satisfaction, and quality of work output.
Key headline findings from the department’s evaluation include:
However, averages mask wide variation:
Practical procurement questions departments should resolve before scaling AI assistants:
But it does not prove transformational, system‑wide change by itself. Limitations include:
This layered evidence suggests a two‑track approach for governments: pilot broadly but roll out narrowly and strategically, focusing first on areas where the tool reliably adds value and where governance is straightforward.
At the same time, the human element remains central:
That case is conditional. It rests on robust governance, contractual safeguards, investment in role‑specific training, and continued independent evaluation — especially before moving into operational or frontline areas where the stakes are higher and data sensitivity acute. IT leaders and policy makers will need to balance speed and scale against auditability, legal compliance and staff trust.
For technology teams, the lessons are straightforward: pilot deliberately, govern comprehensively, train tactically, and measure continuously. If public organisations get those fundamentals right, the minutes saved by assistants like Copilot can translate into meaningful public value; if they do not, the gains risk being ephemeral or worse — offset by new risks to privacy, accuracy and workforce cohesion.
The DWP trial is not the final word on AI in government, but it is an important, evidence‑based chapter. The next chapters must be written with the same care: rigorous evaluation, transparent governance, and a steady insistence that human judgment remains the ultimate arbiter of public service decisions.
Source: PublicTechnology DWP claims 20-minute daily staff gains from Copilot trials
Background and overview
The DWP trial evaluated the licensed (paid) version of Microsoft 365 Copilot and focused on corporate, central‑office functions — policy, digital, finance, communications and similar roles — deliberately excluding frontline operational colleagues such as Jobcentre staff. The department allocated 3,549 licences to a mixture of volunteers and peer nominations and combined large‑scale surveys with econometric analysis and qualitative interviews to estimate effects on three primary outcomes: task efficiency, job satisfaction, and quality of work output.Key headline findings from the department’s evaluation include:
- An average time saving of 19 minutes per day across eight routine tasks.
- 73% of users reporting improved quality of outputs.
- 65% of users reporting greater job fulfilment.
- Evidence of statistically significant increases in perceived work quality (+0.49 on a seven‑point scale) and job satisfaction (+0.56 on a seven‑point scale) compared with a non‑user comparison group.
Trial design, methodology and what the numbers mean
How the DWP measured impact
The DWP used a mixed‑methods approach:- Two surveys: one of Copilot users (1,716 responses) and a stratified comparison group of non‑users (2,535 responses).
- Econometric regression techniques to adjust for observed differences and estimate net effects on efficiency, satisfaction and perceived quality.
- Qualitative interviews to illuminate how and why the tool was used, and where benefits or frictions arose.
Interpreting 19 minutes a day
An average saving of 19 minutes per day sounds small in isolation but scales to substantial time across a large workforce. It equates to roughly four working days per year per person — a meaningful reallocation of staff time if those minutes are consistently redirected to higher‑value tasks.However, averages mask wide variation:
- Some users reported no time saved at all.
- Others reported far larger gains — in some pilots, a subset of users reported saving 30–60 minutes per day on specific tasks.
- The DWP’s estimate aggregates across roles that differ dramatically in the proportion of time spent on drafting, summarising and retrieving information — tasks Copilot is strongest at automating.
Where the time savings came from
The largest, most consistent savings were reported in:- Information searches and internal knowledge retrieval.
- Email drafting and communication tasks.
- Document and meeting note summarisation.
Strengths observed in the trial
1. Productivity gains that free human bandwidth
Copilot’s best‑case value appears to be in shifting staff time away from low‑value administration toward more strategic work: planning, project delivery, stakeholder engagement and mentoring. In theory, that shift increases public service impact without adding headcount.2. Improvements in output quality and tone
A large majority of users reported clearer, better‑phrased outputs. For many people — particularly those involved in frequent written communication — Copilot provided a consistent starting point that saved iteration time and helped standardise tone across the department.3. Accessibility and neurodiversity benefits
The evaluation highlighted clear benefits for neurodivergent employees: users with ADHD reported better focus in distracting environments; those with dyslexia found the drafting and editing assistance particularly valuable. These are practical, inclusion‑focused gains that align with broader public‑sector accessibility goals.4. Rapid adoption and intuitive UX for many staff
Most participants reported Copilot was intuitive and integrated neatly with Office apps they already used. Adoption was often driven by peer demonstrations and manager support rather than formal training — a sign the user experience meets basic expectations.Key risks, limitations and red flags
1. Accuracy, hallucinations and the need for human oversight
Generative models are prone to confident inaccuracies — hallucinations — and the DWP trial reiterates a simple truth: outputs require editorial oversight. The department explicitly noted Copilot was not used to make decisions or replace staff. Organisations that treat AI outputs as authoritative do so at their peril.2. Heterogeneous benefits and potential for uneven allocation
Not all roles benefit equally. Policy teams and technical specialists often need nuanced judgment and source synthesis that current Copilot models struggle with. Wider rollout without role‑specific pilots risks over‑promising gains in areas where AI helps little.3. Governance, data protection and auditability
When AI systems access organisational data, they become vectors for data leakage and compliance lapses. The DWP built an AI security policy and acceptable use guidance; nonetheless, any large‑scale roll‑out must ensure robust logging, traceability of outputs, recordkeeping for decisions, and clear delineation of personal data flows under data‑protection law.4. Vendor dependence and procurement trade‑offs
The trial used Microsoft’s licensed product. That can deliver enterprise security features and integration, but it also raises procurement questions: licensing cost per seat, contract terms, data residency, model updates and dependence on a single supplier for long‑term capability. These considerations should be part of commercial decisions, not afterthoughts.5. Workforce anxieties and industrial relations
Even when trials show productivity gains and improved job satisfaction, staff will worry about what “efficiency” means for headcount. The DWP reported that users viewed Copilot as a tool, not a replacement, but perceptions matter. Transparent communication, genuine involvement of unions and clear policy on deployment and redeployment will be critical to avoid friction.6. National security concerns around foreign tools
The DWP reversed earlier prohibitions on public large language models, but it specifically banned one Chinese‑origin model — referred to in departmental guidance as DeepSeek — reflecting acute national‑security caution. That differential treatment underscores the geopolitical dimension of AI procurement and access policies.Governance, security and the DeepSeek exception
The DWP updated its Acceptable Use Policy and created an Artificial Intelligence Security Policy to regulate the use of off‑the‑shelf and corporate AI tools. Key practical elements that emerged from the trial and accompanying guidance include:- Approval pathways for any use that touches sensitive or personal data.
- Board‑level governance to authorise business cases and manage risk assessments.
- Mandatory incident reporting for suspected data leaks involving AI tools.
- Role‑specific restrictions — frontline staff were deliberately excluded from this centrally focused trial.
Cost, licensing and procurement realities
The trial used the licensed Microsoft 365 Copilot — the paid enterprise variant that integrates with organisational data and offers enhanced compliance and security controls. After the trial, a version of Copilot became available more widely across the department.Practical procurement questions departments should resolve before scaling AI assistants:
- What is the per‑seat licence cost, and how does that compare to expected time savings?
- How will licences be prioritised across functions to avoid inequitable access?
- Which contractual clauses ensure data protection, model behaviour transparency, and the right to audit?
- What exit options exist if the vendor’s roadmap diverges from departmental needs?
Practical recommendations for IT leaders and decision‑makers
Based on the DWP evidence and wider government experience, these pragmatic steps will help public organisations turn pilot results into safe, sustainable practice:- Establish a cross‑functional AI governance board
- Include representatives from IT security, data protection, legal, HR, policy leads and union reps.
- Authorise use cases, maintain a risk register, and require periodic re‑approval.
- Use staged, role‑specific pilots
- Expand incrementally from corporate admin roles to policy teams and only then to frontline services where data sensitivity is higher.
- Design outcome metrics (time saved, quality scores, error rates) before rollout.
- Mandate human‑in‑the‑loop validation
- For any outputs that affect decisions, require sign‑off or human editing.
- Maintain audit trails for generated content used in official communications or decisions.
- Create targeted training and prompt literacy
- Short, role‑specific modules will be more effective than one‑size‑fits‑all webinars.
- Teach staff how to construct clear prompts, verify outputs, and recognise hallucinations.
- Contract for transparency and resilience
- Procurement should demand information about model updates, data handling, red‑teaming and vulnerability disclosure procedures.
- Include clauses enabling audits and exit strategies.
- Monitor equity and accessibility impacts
- Track how tools affect staff with protected characteristics and ensure that accessibility gains are preserved across rollouts.
- Communicate clearly and involve staff
- Address job‑security concerns head‑on with unions and line managers.
- Publish transparent evaluation reports and next steps so that staff trust the process.
What the DWP trial does — and does not — prove
The DWP evaluation provides credible evidence that Copilot can reduce time spent on routine tasks and improve perceived output quality in central office roles. It also reinforces that benefits are highly context‑dependent: task composition, access to high‑quality internal data, and the quality of onboarding all matter.But it does not prove transformational, system‑wide change by itself. Limitations include:
- The trial’s sample and voluntary licence allocation create potential selection bias.
- Longitudinal effects are unknown: will the initial efficiency boost persist once novelty fades?
- The evaluation focuses on perceived quality and time savings rather than hard operational outcomes like error reduction, customer satisfaction or case clearance rates.
- The frontline exclusion means the results cannot be extrapolated to Jobcentre operations or other high‑sensitivity public services without fresh piloting.
The wider public‑sector picture: comparison with cross‑government trials
The DWP’s 19‑minute estimate sits below the 26‑minute average reported in a separate government‑wide experiment that covered more than 20,000 staff across departments. Reasons for the gap include:- Different time windows and sample compositions.
- Variations in the tasks measured (some pilots captured presentation drafting and coding tasks where Copilot can deliver larger savings).
- Shorter trials sometimes benefit from novelty and concentrated support that inflates early gains.
This layered evidence suggests a two‑track approach for governments: pilot broadly but roll out narrowly and strategically, focusing first on areas where the tool reliably adds value and where governance is straightforward.
Equity, inclusion and the often‑overlooked human element
The DWP finding that neurodivergent staff gained disproportionate benefit is a reminder that digital inclusion should be a primary rather than peripheral objective for public‑sector AI procurement. Accessibility gains are genuine and should shape deployment priorities.At the same time, the human element remains central:
- AI can reduce cognitive load and task friction, improving staff wellbeing.
- It can also create new types of low‑value oversight work — for example, editing AI drafts — which needs recognition in role design and performance measures.
- Effective change management matters: managers who model use and clarify expectations were critical drivers of adoption in the DWP trial.
Conclusion: measured optimism, not uncritical rollout
The DWP’s Copilot trial offers compelling, empirically grounded reasons to use generative AI as a productivity multiplier in central office functions. The combination of time savings, higher perceived output quality, and accessibility benefits makes a persuasive case for wider but managed adoption.That case is conditional. It rests on robust governance, contractual safeguards, investment in role‑specific training, and continued independent evaluation — especially before moving into operational or frontline areas where the stakes are higher and data sensitivity acute. IT leaders and policy makers will need to balance speed and scale against auditability, legal compliance and staff trust.
For technology teams, the lessons are straightforward: pilot deliberately, govern comprehensively, train tactically, and measure continuously. If public organisations get those fundamentals right, the minutes saved by assistants like Copilot can translate into meaningful public value; if they do not, the gains risk being ephemeral or worse — offset by new risks to privacy, accuracy and workforce cohesion.
The DWP trial is not the final word on AI in government, but it is an important, evidence‑based chapter. The next chapters must be written with the same care: rigorous evaluation, transparent governance, and a steady insistence that human judgment remains the ultimate arbiter of public service decisions.
Source: PublicTechnology DWP claims 20-minute daily staff gains from Copilot trials