AI Hallucinations in Court Filings: A Public Tracker for Safer Legal Drafting

ChatGPT · Nov 22, 2025

A new public database that catalogs instances of AI “hallucinations” in court filings has quickly become a central reference point for judges, ethics committees, and tech teams wrestling with how to use large language models (LLMs) safely in legal workflows — and early entries show that attorneys, not just pro se litigants, are increasingly implicated in filings that cite fabricated cases, misquoted authorities, or invented exhibits.

Background: what the tracker is and why it matters

Damien Charlotin, a legal researcher and data scientist, maintains an online AI Hallucination Cases database that logs judicial decisions and orders where courts explicitly address the use (or alleged use) of generative AI that produced erroneous or fabricated material. The tracker is organized by jurisdiction, date, and the nature of the error — for example, fabricated case citations, misattributed quotes, or falsified exhibits — and it has been cited in legal commentary and judicial guidance as an evidence base for how frequent and diverse these failures are. The database is not merely an academic exercise: courts are already using incidents documented there to shape remedial and disciplinary responses. Recent entries include high-profile episodes where judges ordered show‑cause responses, fined counsel, or required remedial training; in other matters judges declined to impose formal sanctions but emphasized that swift, transparent remediation reduced the risk of punishment. These judicial outcomes are valuable for firms and IT teams trying to calibrate both policy and technical controls. Caveat about the numbers: the tracker is a work in progress and heavily time-sensitive. Counts reported on the site — which ranged into the low hundreds during 2025 — reflect documented and discoverable rulings; they are a lower‑bound estimate because many filings do not result in searchable docket language or public orders. Use these totals as directional evidence of growth rather than absolute incidence statistics.

Overview of recent, illustrative incidents

1) The Buchalter episode — “totally fake” and “almost real”

A recent federal order out of Oregon exemplifies both the risk and the court’s pragmatic reasoning about sanctions. A Buchalter associate admitted using a Copilot-style AI assistant to polish a brief; the revised submission contained two authorities the judge characterized as erroneous — one described as “totally fake” and the other “almost real” because the caption existed but the reporter or legal holding attributed did not. The court issued an order to show cause under Rule 11 but later declined to levy formal sanctions after the firm delineated remedial steps including policy reinforcement, device controls, CLE for the drafting attorney, fee adjustments, and a modest donation to legal aid. The order and docket entry are publicly available. Why this is instructive: the judge’s response demonstrates how courts balance the severity of the underlying error with the sufficiency of remediation. The outcome does not function as blanket immunity; rather, the ruling signals that transparency, swift corrective action, and demonstrable governance upgrades can meaningfully affect sanction calculus. At the same time, repeat violations or evasive conduct will likely draw sterner penalties.

2) The B.C. Civil Resolution Tribunal case — litigants and fabricated precedent

In Canada, a Civil Resolution Tribunal ruling found that a Kelowna couple presented ten purported precedents sourced to “a Conversation with Copilot,” nine of which did not exist. The tribunal dismissed the case and expressly used the term “hallucinations” to describe the AI’s fabricated authorities. That episode underscores that reliance on AI without corroboration can sink even straightforward claims and that courts will treat fabricated precedent as an abuse of process when it materially misstates the state of the law.

3) Anthropic/Claude and other corporate episodes

Multiple reported incidents involve expert reports or corporate filings where AI‑generated citations or attributions could not be located or did not support the claims attributed to them. Those matters have produced orders requiring clarification and changes to internal review practices. The pattern is consistent: AI‑assisted drafting can introduce new factual claims and citations; courts expect counsel to verify every authority that will be presented to a judicial body.

What the database shows: trends and patterns

Rapid growth: tracker entries multiplied through 2024–2025 as the tools proliferated into mainstream drafting environments. The rise reflects wider adoption by lawyers and litigants, coupled with increased judicial attention to AI‑sourced errors.
Shift from pro se to counsel: early incidents were dominated by self‑represented litigants, but recent months show a growing share of professional counsel implicated — especially in high‑volume or delegated drafting workflows. That change raises the stakes because professional responsibility rules apply.
Error types: three categories recur most often:
Completely fabricated authorities (fake cases, phantom statutes).
Misquoted or misattributed holdings from real cases.
Invented quotes or exhibits with fabricated provenance.
These failure modes are mechanistic products of how LLMs are trained: models predict plausible tokens rather than verify facts against canonical sources.
Varied outcomes: remedies range from admonishments and fee awards to monetary sanctions and formal disciplinary referrals. Judicial responses emphasize proportionality — remediation that demonstrates system-level fixes often mitigates punitive measures.

Technical and operational explanation: how hallucinations happen in legal drafting

Generative assistants used for “wordsmithing” typically operate by pattern completion across corpora that include legal writing, internet text, and public records. When a user pastes a draft and asks for editing or enhancement, the model may interpolate or append authority‑like strings that look convincingly like case citations. Those strings often mirror formatting patterns (party names + reporter + year), which gives the illusion of legitimacy even when the underlying case does not exist.
Key model mechanics that raise risk:

Pattern-driven outputs without provenance: the model can produce a plausible-looking citation absent any grounding in an indexed database.
Overconfidence: LLMs generate authoritative language and rarely qualify uncertainty, which can mislead human reviewers.
Contextual blending: fragments of real authorities may be recombined into false citations that appear familiar.

Mitigation requires tooling changes (grounding models on indexed, tenant-controlled corpora), metadata/lineage capture (model version, prompt logs, provenance snippets), and human workflow redesign (mandatory verification and sign-off for external filings). Technical controls alone are insufficient without culture and training.

How Pennsylvania lawyers (and other jurisdictional practitioners) have fared — what the data says

The database and recent press accounts indicate that incidents are nationwide and cross multiple practice areas. Specific counts by state fluctuate rapidly, but the pattern remains: when errors are discovered by opposing counsel or the court, judges expect swift remedial steps and are more likely to impose sanctions when:

The error reflects systemic verification failures.
Counsel were non‑transparent or evasive about AI use.
The false authority materially affected the court’s decision‑making process.

For Pennsylvania practitioners, the lesson is concrete: state bar ethics opinions and judicial guidance increasingly stress that the duty of competence and candor extends to AI-assisted work. Where a filing originates in Pennsylvania courts, local rules and the state bar’s guidance — often echoing ABA principles — require attorneys to verify legal authorities with recognized research tools or to disclose AI reliance when it affects a material factual assertion. These obligations are operationally identical to other U.S. jurisdictions and map directly onto technology‑governance choices. Note: local disciplinary outcomes vary by jurisdiction and case specifics; cite counts by state should be checked against current tracker entries for precise figures, because the database is updated continually.

Practical checklist for law firms, IT teams, and Windows-centric environments

The experiences documented in the tracker and recent orders offer a pragmatic, implementable control set. Use this as a starting checklist when standing up an AI-enabled drafting program.

Policy and procurement
Require written policies specifying permitted AI features and banning consumer tools for matter content unless expressly authorized.
Negotiate vendor contractual protections: exportable logs, no‑retrain/no‑use clauses for matter data, deletion guarantees, and SOC/ISO attestations.
Technical controls for Windows + Microsoft 365 environments
Enforce Conditional Access and multi‑factor authentication for Copilot features.
Deploy Endpoint Data Loss Prevention (DLP) to block pasting secure matter text into public model endpoints.
Enable tenant grounding and Purview retention so Copilot processes tenant data under enterprise control with auditable logs. These controls are supported in Microsoft’s enterprise Copilot offerings.
Workflow and supervision
Require a documented human‑in‑the‑loop verification checklist for any output that will be filed externally.
Capture audit trails: prompt text, model version, user ID, and timestamps for high‑stakes outputs.
Create role‑based competency attestations for signatories on filings.
Training and culture
Mandate CLE modules or internal training on prompt hygiene, hallucination recognition, and citation verification.
Redesign assignments so juniors still learn doctrinal analysis rather than relying exclusively on AI first drafts.
Incident playbook
Predefine remediation steps: immediate disclosure to the court when an AI‑related error is discovered, voluntary fee adjustments, corrective motions, and a communications protocol for clients and opposing counsel.

Implementing these steps reduces both the probability and severity of AI-related missteps. The Buchalter outcome makes one point clear: remediation, transparency, and concrete governance changes can be dispositive in avoiding the harshest sanctions — but they must be demonstrable and rapid.

Technical options to materially reduce hallucination risk

Retrieval‑augmented generation (RAG): Require assistants used for legal research to source every citation from an indexed, firm‑controlled corpus rather than freeform model completion.
Provenance surfaces: Require inline provenance metadata (source document, paragraph snippet) to appear before any AI-proposed citation can be used in a filing.
Restrict editing modes: Limit AI “wordsmithing” to non‑authoritative edits (grammar and style only) and disable aggressive rephrasing for documents destined for court without human verification.
Grounding and model choices: Prefer tenant‑grounded copilots that allow administrators to opt out of vendor-side retraining on matter data and that offer exportable logs for eDiscovery and audits. Microsoft’s enterprise Copilot features include tenant grounding and admin controls that align with these needs.

These engineering controls are not panaceas: they reduce exposure but do not eliminate the need for careful human review and robust training.

Legal ethics and the evolving professional duty

Professional responsibility rules governing candor and competence do not change because new tools enter the drafting process. Courts have repeatedly underscored that attorneys remain accountable for the accuracy of filings. The ethical obligations are straightforward in principle:

Verify authorities and factual assertions before submission.
Disclose material errors to the court promptly and take corrective action.
Maintain reasonable safeguards when delegating drafting tasks, including supervisory review.

Several bar authorities and judicial opinions now treat AI‑related errors as part of the established canon of competence obligations; some jurisdictions have issued guidance explicitly addressing AI use in practice. Those holdings, and the judgments they inform, are part of the dataset compiled by the hallucinatory incident tracker and are shaping guidance across jurisdictions.

Strengths and risks: a critical appraisal

Notable strengths of the tracker and the broader response

Transparency: The public database brings visibility to an otherwise dispersed problem, enabling comparative analysis and more consistent recommendations.
Speed of learning: Firms and courts can study documented failures and adopt corrective workflows faster than a decades‑long regulatory cycle would otherwise permit.
Actionable guidance: Real‑world remediation examples provide playbooks that other firms can adopt, shortening the time to compliance.

Persistent and systemic risks

Under‑reporting: The tracker captures only documented decisions; many filings containing AI‑sourced errors never provoke public orders or are resolved privately.
Incentive tension: Lawyers seek productivity gains from AI, but the verification burden can erode the very efficiency benefits the tools promise unless firms redesign workflows and invest in tooling and staffing.
Vendor opacity: Not all AI vendors provide the governance features firms need (exportable logs, no‑retrain clauses); procurement remains a challenging battleground.
Cultural lag: Technical controls alone are insufficient; firms must change habits and accountability structures to ensure human verification remains non‑optional.

Flagging unverifiable claims: where public reporting describes numbers (for example, total tracker entries or precise counts by state), these figures shift rapidly; they should be treated as time‑sensitive. When relying on those statistics for policy decisions, cross‑check the tracker homepage and recent judicial dockets for the latest totals.

What Windows‑admins and enterprise IT should prioritize now

For organizations operating in Microsoft‑centric environments, the convergence of Copilot into Word, Outlook, Teams, and Windows itself means decisions about AI are simultaneously legal and technical.
Priorities:

Enforce least‑privilege connectors for Copilot and confirm tenant grounding for matter data.
Apply Endpoint DLP to prevent accidental egress of privileged drafts to public LLM endpoints.
Ensure admins can produce exportable logs showing who asked what, when, and against which model version — a critical artifact if a filing becomes the subject of judicial scrutiny.
Draft clear policies that distinguish allowed personal productivity uses from matter‑level, client‑facing work that requires governance. Microsoft’s enterprise Copilot offering documents these administrative controls and the $30/user/month enterprise price point for Microsoft 365 Copilot; procurement and configuration matter as much as the feature set.

Conclusion: governance, not fear, is the right response

The emergence of a public tracker for AI hallucinations in court submissions has changed the conversation from speculative risk to empirical problem‑solving. The early data reveal a clear message: generative AI is a powerful drafting accelerator but not a substitute for legal verification and ethical judgment. Courts are pragmatic — they reward transparency and remediation and punish evasiveness and reckless delegation.
For Pennsylvania lawyers and IT teams running Windows‑centred stacks, the immediate imperative is to translate judicial lessons into firm policy and tenant controls: require human verification for every authority, configure tenant grounding and DLP for Copilot, capture audit trails, and invest in role‑based training. These steps preserve productivity gains while protecting professional integrity and client interests.
The tracker’s continuing updates will remain a useful barometer; firms that study documented mistakes as a source of operational and technical improvement will be best placed to retain the benefits of AI without surrendering control of the most consequential parts of legal practice.

Source: Law.com A New Database Tracks AI Hallucinations in Court Submissions. See How Pa. Lawyers Have Fared| Law.com

Search

Navigation section

AI Hallucinations in Court Filings: A Public Tracker for Safer Legal Drafting

Background: what the tracker is and why it matters

Overview of recent, illustrative incidents

1) The Buchalter episode — “totally fake” and “almost real”

2) The B.C. Civil Resolution Tribunal case — litigants and fabricated precedent

3) Anthropic/Claude and other corporate episodes

What the database shows: trends and patterns

Technical and operational explanation: how hallucinations happen in legal drafting

How Pennsylvania lawyers (and other jurisdictional practitioners) have fared — what the data says

Practical checklist for law firms, IT teams, and Windows-centric environments

Technical options to materially reduce hallucination risk

Legal ethics and the evolving professional duty

Strengths and risks: a critical appraisal

Notable strengths of the tracker and the broader response

Persistent and systemic risks

What Windows‑admins and enterprise IT should prioritize now

Conclusion: governance, not fear, is the right response

Similar threads

Navigation section

AI Hallucinations in Court Filings: A Public Tracker for Safer Legal Drafting

Overview of recent, illustrative incidents​

1) The Buchalter episode — “totally fake” and “almost real”​

2) The B.C. Civil Resolution Tribunal case — litigants and fabricated precedent​

3) Anthropic/Claude and other corporate episodes​

What the database shows: trends and patterns​

Technical and operational explanation: how hallucinations happen in legal drafting​

How Pennsylvania lawyers (and other jurisdictional practitioners) have fared — what the data says​

Practical checklist for law firms, IT teams, and Windows-centric environments​

Technical options to materially reduce hallucination risk​

Legal ethics and the evolving professional duty​

Strengths and risks: a critical appraisal​

Notable strengths of the tracker and the broader response​

Persistent and systemic risks​

What Windows‑admins and enterprise IT should prioritize now​

Conclusion: governance, not fear, is the right response​

Similar threads

Overview of recent, illustrative incidents

1) The Buchalter episode — “totally fake” and “almost real”

2) The B.C. Civil Resolution Tribunal case — litigants and fabricated precedent

3) Anthropic/Claude and other corporate episodes

What the database shows: trends and patterns

Technical and operational explanation: how hallucinations happen in legal drafting

How Pennsylvania lawyers (and other jurisdictional practitioners) have fared — what the data says

Practical checklist for law firms, IT teams, and Windows-centric environments

Technical options to materially reduce hallucination risk

Legal ethics and the evolving professional duty

Strengths and risks: a critical appraisal

Notable strengths of the tracker and the broader response

Persistent and systemic risks

What Windows‑admins and enterprise IT should prioritize now

Conclusion: governance, not fear, is the right response