A UK tax tribunal judge has openly acknowledged using generative AI to produce draft material for a published ruling—an explicit, carefully documented instance that crystallises the legal profession’s urgent debate over when, how and under what safeguards courts and tribunals should use tools such as Microsoft Copilot Chat.
The disclosure arises from the First-tier Tax Tribunal decision in VP Evans (as executrix of HB Evans, deceased) & Ors v The Commissioners for HMRC [2025] UKFTT 1112 (TC), where Judge Christopher McNall reported that he used Microsoft’s Copilot Chat—made available to judicial office holders via the eJudiciary platform—to summarise documents during the preparation of a ruling on a disclosure application. The judge emphasised that the AI-generated summaries were treated only as a “first-draft” and that he did not use AI for legal research; he also took responsibility for the final evaluative judgment in the decision.
This admission sits against the backdrop of formal guidance issued to the judiciary: updated “AI: Guidance for Judicial Office Holders” (published by the Courts and Tribunals Judiciary) which explicitly describes Copilot Chat’s availability on judicial devices and instructs judges on transparency, risk awareness (misinformation, bias, dataset quality) and the responsibilities of judicial office holders when relying on AI-generated material.
At the same time, practice-level directions aimed at ensuring the adequacy and clarity of reasons in tribunal decisions—most notably the Practice Direction on Reasons for Decisions (Senior President of Tribunals, 4 June 2024)—encourage the sensible use of digital tools where they support efficiency without undermining fairness or the integrity of the decision-making process.
The Senior President of Tribunals’ Practice Direction on reasons (4 June 2024) also plays a role: it instructs that reasons must remain adequate and intelligible to parties and appellate bodies, which implies that any AI usage that threatens transparency or the traceability of reasoning could render a decision vulnerable on appeal.
That international trend underlines that the Evans decision is part of a global rebalancing: judicial systems accept AI’s operational benefits, but insist on guardrails that preserve fairness, privacy and explainability.
For courts, the operational challenge is now organisational: build procurement and technical capability that preserves data confidentiality and provenance, train judges to detect and correct hallucinations, and make disclosure the norm rather than the exception. Done well, this will let courts improve efficiency without relinquishing judicial responsibility; done badly, it risks procedural unfairness and appellate vulnerability. The Evans ruling shows that a cautious, transparent experiment in judicial AI—one that keeps the human judge in the loop and on the hook—can be both practical and principled.
Source: Monckton Chambers https://www.monckton.com/use-of-ai-in-the-tribunal-brendan-mcgurk-kc/
Background
The disclosure arises from the First-tier Tax Tribunal decision in VP Evans (as executrix of HB Evans, deceased) & Ors v The Commissioners for HMRC [2025] UKFTT 1112 (TC), where Judge Christopher McNall reported that he used Microsoft’s Copilot Chat—made available to judicial office holders via the eJudiciary platform—to summarise documents during the preparation of a ruling on a disclosure application. The judge emphasised that the AI-generated summaries were treated only as a “first-draft” and that he did not use AI for legal research; he also took responsibility for the final evaluative judgment in the decision. This admission sits against the backdrop of formal guidance issued to the judiciary: updated “AI: Guidance for Judicial Office Holders” (published by the Courts and Tribunals Judiciary) which explicitly describes Copilot Chat’s availability on judicial devices and instructs judges on transparency, risk awareness (misinformation, bias, dataset quality) and the responsibilities of judicial office holders when relying on AI-generated material.
At the same time, practice-level directions aimed at ensuring the adequacy and clarity of reasons in tribunal decisions—most notably the Practice Direction on Reasons for Decisions (Senior President of Tribunals, 4 June 2024)—encourage the sensible use of digital tools where they support efficiency without undermining fairness or the integrity of the decision-making process.
Why this matters: the judicial use-case in plain terms
The judiciary’s interest in AI is pragmatic. Courts and tribunals face mounting caseloads, routine administrative pressures and expectations for speed without sacrificing the standard of reasoning required by appellate review. Well-scoped AI use—document summarisation, drafting non-decisional administrative notes, or producing machine-first drafts for subsequent judicial editing—promises clear time savings and consistency benefits.- Efficiency gains: AI can accelerate the first-draft cycle for non‑substantive work and routine procedural rulings.
- Scalability: High-volume written material (large bundles, repeated procedural applications) becomes more tractable.
- Standardisation: Drafting templates and consistent summaries can reduce variance in clerking and administrative outputs.
What the Evans / McNall decision actually says (key excerpts and practical implications)
Judge McNall’s ruling makes three critical, transparent points that form a practical baseline for any judicial AI policy:- Scope-limited use: AI was used to summarise documents only; it was not used for legal research. The judge framed the output as a first draft and explicitly confirmed personal verification.
- Responsible ownership: The judge emphasised responsibility—“This decision has my name at the end. I am the decision-maker.” That statement underlines a key legal principle: courts retain ultimate accountability for content and reasoning regardless of any automated assistance.
- Transparency to the record: The decision’s postscript entitled “The Use of AI” discloses how AI was used and why the tribunal considered that use appropriate for a paper-only case-management matter where no witness credibility findings were required. This sets a useful transparency precedent for future decisions.
The regulatory and jurisprudential context
National guidance and institutional adoption
The Courts and Tribunals Judiciary updated their guidance to reflect practical controls and safeguards around Copilot Chat—clarifying that judicial use may be appropriate where the model is accessed via secure, eJudiciary‑provisioned devices and where outputs are independently verified by the judicial office holder. That guidance addresses misinformation, bias and dataset quality, and it reiterates that litigants are responsible for AI-generated material they put before the court.The Senior President of Tribunals’ Practice Direction on reasons (4 June 2024) also plays a role: it instructs that reasons must remain adequate and intelligible to parties and appellate bodies, which implies that any AI usage that threatens transparency or the traceability of reasoning could render a decision vulnerable on appeal.
Comparative policy moves: courts outside England & Wales
Internationally, courts and judicial systems are moving in a similar direction—either adopting cautious permission frameworks or imposing limits. In the U.S., for example, several state judicial systems have adopted model rules or task-force recommendations requiring either an outright ban or tightly regulated use of generative AI by judges and staff, with specific mandates on confidentiality, disclosure and human verification. Recent reporting highlights California’s judicial rules that demand local court policies addressing confidentiality, bias and disclosure.That international trend underlines that the Evans decision is part of a global rebalancing: judicial systems accept AI’s operational benefits, but insist on guardrails that preserve fairness, privacy and explainability.
Technical realities: what courts can (and cannot) safely delegate to AI today
Generative models are strong at pattern-based tasks and fluent text generation but are inherently probabilistic: they predict likely continuations rather than consult an immutable repository of verified facts. This leads to two recurring technical realities:- Hallucinations: Models can invent facts, case citations or dates that appear plausible. In legal contexts this can be catastrophic if unchecked. Real-world examples already exist where AI-generated false precedents were cited in filings with serious consequences for litigants.
- Opacity of provenance: Unless specifically engineered to produce verifiable citations or to ground itself on trusted databases, a model’s output may lack traceable provenance—complicating any attempt to audit how a particular paragraph or conclusion was reached.
Risks — legal, ethical and operational
Legal and appellate risk
- Adequacy of reasons: If AI-contributed language obscures the decision-maker’s reasoning pathway, an appellate body could find the reasons inadequate; practice directions already stress proportional clarity in reasoning.
- Evidential disputes: Parties may legitimately demand access to AI-generated summaries, prompting debates over disclosure obligations, audit trails and the right to challenge an AI’s representation of underlying documents.
Confidentiality and data protection
- Sensitive input risk: Entering confidential filings, witness statements or sealed documents into cloud-based models creates risk unless the model guarantees no retention or use for training and is run within a secure, on‑tenant instance. The judiciary’s explicit reference to the eJudiciary-backed Copilot Chat as a private instantiation addresses this concern in part.
Operational and procurement risks
- Vendor lock-in and contractual gaps: Courts must secure contractual assurances (no‑training, delete-on-demand, auditable logs and exportability) to prevent downstream training on judicial material or the inability to produce prompt/response logs during review.
- Model drift and calibration: Updates or fine‑tuning by vendors can change behaviour; courts need versioning and stable, auditable environments.
Systemic and public‑trust risks
- Loss of public confidence: Secrecy, errors, or opaque reliance on “black box” outputs can erode the public’s trust in impartial adjudication. The judicial emphasis on disclosure and demonstrated oversight is therefore not cosmetic—it’s essential to maintain legitimacy.
Best-practice guardrails (operational checklist for courts and tribunals)
- Use only secure, enterprise-hosted AI instances (tenant-bound Copilot Chat or equivalent) that include non‑training clauses and data-residency guarantees.
- Require explicit human-in-the-loop sign-off: any AI output relied on in a ruling must be reviewed, edited and certified by the judicial office holder.
- Disclose AI use in the published decision and describe its purpose and limits (e.g., “used for document summarisation only; not used for legal research”).
- Maintain an auditable log of prompts, raw outputs and the final edited versions for internal review or for disclosure when legitimately required.
- Implement role-based access, strong DLP, SSO, and retention policies for AI interaction logs to protect sensitive information.
- Draft procurement clauses that require vendor attestations: SOC 2/ISO attestations, exportable logs, non-training commitments and SLA versioning guarantees.
- Train judges and staff on prompt hygiene, hallucination detection and the judicial responsibilities attached to using AI-generated outputs.
- Designate an AI governance officer or committee within the court system to approve use-cases and monitor incidents and model updates.
Practical templates: how disclosure might look in a judgment
- Short, clear language appended to a decision, for example:
- “The judge used an eJudiciary-provisioned instance of Microsoft Copilot Chat to produce first-draft summaries of the closed documents bundle. These summaries were verified and edited by the judge; Copilot Chat was not used for legal research. The judge remains solely responsible for the reasoning and conclusions in this decision.”
Where the line should be drawn: permitted versus prohibited uses
Permitted (with strict controls)- Document summarisation for procedural, non-evidentiary matters.
- Drafting administrative directions, scheduling orders, or routine case-management letters.
- Producing editable first-drafts for clerks to speed up workflow that are always reviewed by the judge.
- Automated evaluation of witness credibility or weighing of disputed fact evidence.
- Legal research relied upon as authoritative without machine-readable, verifiable citations to primary sources.
- Uploading sealed or highly sensitive documents into third-party models without explicit contractual and technical guarantees.
Lessons for lawyers, litigants and court IT teams
- Lawyers should treat AI-generated material as they would any external evidence: mark its provenance, be prepared to disclose how it was produced, and verify its accuracy before relying on it in filings or submissions.
- Litigants should expect courts to disclose material AI use and to preserve logs of AI outputs if AI materially shaped a judge’s understanding of the documents or submissions.
- Court IT and procurement teams must prioritise secure enterprise solutions, insist on vendor non‑training clauses and keep the technical capability to produce logs for audit or disclosure.
Broader systemic implications and future-proofing
The Evans decision and the Judiciary’s published guidance signal a pragmatic trajectory: courts will not ignore AI; they will permit sensible, transparent uses while seeking to institutionalise oversight and accountability. As adoption grows, courts will need:- Clear, system-wide policies harmonised across jurisdictions to prevent forum-shopping for lax AI controls.
- Investment in on-premises or tenant-hosted models tuned to legal corpora and instrumented for provenance and citation-tracing.
- Continuous training and certification programmes for judges and staff on AI literacy and governance.
Caveats and unverifiable points
- Any claim about the internal configuration, contractual guarantees or telemetry retention of Microsoft’s Copilot Chat on the eJudiciary platform should be treated as operationally contingent unless confirmed by procurement documentation or vendor attestation; public guidance affirms that Copilot Chat is available and that data remains private when used under eJudiciary accounts, but granular contractual terms are not publicly disclosed in full.
- Reports of incidents in other jurisdictions (for example, fabricated AI‑generated case citations appearing in filings) are documented in secondary reporting and professional commentary; where necessary, those reports should be validated against primary tribunal or court records before relying on them as precedent in litigation practice.
Conclusion
The Evans decision is an important, quietly revolutionary step: a tribunal judge publicly acknowledging the use of AI—limited, disclosed and human‑verified—creates a practical framework for adoption that other courts can study and refine. The combination of explicit guidance from the judiciary, practice-direction clarity on reasons, and an insistence on human oversight forms a defensible middle path: harness the productivity of tools like Microsoft Copilot Chat while protecting the core legal values of transparency, accountability and explainable reasoning.For courts, the operational challenge is now organisational: build procurement and technical capability that preserves data confidentiality and provenance, train judges to detect and correct hallucinations, and make disclosure the norm rather than the exception. Done well, this will let courts improve efficiency without relinquishing judicial responsibility; done badly, it risks procedural unfairness and appellate vulnerability. The Evans ruling shows that a cautious, transparent experiment in judicial AI—one that keeps the human judge in the loop and on the hook—can be both practical and principled.
Source: Monckton Chambers https://www.monckton.com/use-of-ai-in-the-tribunal-brendan-mcgurk-kc/