Over the course of a single news cycle in late December 2025, a decades‑long dispute between a private citizen and a global energy company was reframed not by a press release or a new court filing, but by public experiments with generative AI — and the result is a practical, governance‑first warning about how modern information systems handle contested corporate histories.
John Donovan’s long-running campaign against Royal Dutch Shell is not new. What began as commercial litigation in the 1990s evolved into a meticulously curated network of archival websites, court documents, Subject Access Request disclosures, and commentary hosted under domains such as royaldutchshellplc.com and sister sites. That archive has intermittently seeded mainstream reporting for years; the Donovans’ activities were the subject of a prominent profile in 2009, which documented how a persistent, searchable archive can attract leaks and journalism. A central, verifiable moment in the dispute’s institutional history is the World Intellectual Property Organization (WIPO) decision in Case No. D2005‑0538, in which Shell challenged several of the Donovans’ domain registrations. The WIPO administrative panel denied Shell’s complaint; that decision remains a concrete legal anchor in the saga. What changed in December 2025 was not a fresh court order but a different mechanism of public amplification: Donovan deliberately fed material from his archive into multiple public AI assistants (reported and published interactions cited Grok, Microsoft Copilot, ChatGPT, and Google AI Mode). The divergent outputs — one assistant inventing an emotional causal claim about a death, another correcting it, and a third reporting the social pattern — turned a private quarrel into a governance stress test.
Caveat: relying on “model A will catch model B” is brittle governance. It can occasionally surface errors, but it is not a substitute for primary‑source verification and robust provenance.
The pragmatic path forward is governance‑centric:
Key verified references used in this analysis: the WIPO administrative decision in Case No. D2005‑0538, contemporaneous mainstream reporting including Guardian/Observer coverage of the Donovans, and the Donovan archive where the December 2025 experiments were posted. The archive documents the experiment; the WIPO decision and press coverage provide independent anchors for parts of the dispute — but several micro‑level attributions in the archive remain plausible but not independently proven and should be treated with appropriate scepticism. (windowsforum.com posting: AI Hallucinations and the Donovan Shell Archive: A Governance Challenge
Background
John Donovan’s long-running campaign against Royal Dutch Shell is not new. What began as commercial litigation in the 1990s evolved into a meticulously curated network of archival websites, court documents, Subject Access Request disclosures, and commentary hosted under domains such as royaldutchshellplc.com and sister sites. That archive has intermittently seeded mainstream reporting for years; the Donovans’ activities were the subject of a prominent profile in 2009, which documented how a persistent, searchable archive can attract leaks and journalism. A central, verifiable moment in the dispute’s institutional history is the World Intellectual Property Organization (WIPO) decision in Case No. D2005‑0538, in which Shell challenged several of the Donovans’ domain registrations. The WIPO administrative panel denied Shell’s complaint; that decision remains a concrete legal anchor in the saga. What changed in December 2025 was not a fresh court order but a different mechanism of public amplification: Donovan deliberately fed material from his archive into multiple public AI assistants (reported and published interactions cited Grok, Microsoft Copilot, ChatGPT, and Google AI Mode). The divergent outputs — one assistant inventing an emotional causal claim about a death, another correcting it, and a third reporting the social pattern — turned a private quarrel into a governance stress test.Overview: what the assistants produced and why the differences matter
Grok (xAI): compelling narrative, harmful invention
One assistant — publicly identified in Donovan’s posts and follow‑ups as Grok — produced a fluent, emotionally resonant mini‑biography that included a fabricated causal claim: that Alfred Donovan had died “from the stresses of the feud.” That line contradicted the Donovans’ own public record that Alfred Donovan died in July 2013 at age 96 after a short illness. The claim is a textbook example of hallucination — where a model favours narrative coherence over conservative sourcing. Why it matters: when a model invents precise, emotionally charged facts about real people — especially about causes of death — the output can cause reputational harm and legal risk and force downstream remediation by human editors and legal teams.ChatGPT (OpenAI): corrective counter‑narrative
When the same dossier was presented to another assistant, ChatGPT flagged and corrected the invented cause‑of‑death line, explicitly citing the documented obituary material and urging caution. That public contradiction — one model inventing a fact, another debunking it — became the visible demonstration Donovan used to argue that cross‑model interrogation can reveal hallucinations quickly.Caveat: relying on “model A will catch model B” is brittle governance. It can occasionally surface errors, but it is not a substitute for primary‑source verification and robust provenance.
Microsoft Copilot: hedged synthesis
Microsoft Copilot’s output in the published experiments was reported as a composed, readable summary that deliberately hedged uncertain claims (phrases like “unverified narrative”) and emphasized caution. That conservative posture produced an audit‑friendlier summary while still being usable — a practical middle ground between dramatic invention and total silence.Google AI Mode: meta‑observation
Google AI Mode framed the episode at a meta level — describing Donovan’s experiment as a deliberate attempt to pull archival material into multiple assistants and observing the pattern of conflicting outputs. In other words, one assistant described the social process rather than reasserting contested facts. That behaviour highlights a design choice: treat signal gaps and institutional silence as meaningful context.The evidence base: what is demonstrably verifiable and what remains contested
To evaluate the episode, the materials must be grouped into three categories:- Documented, court‑traceable artifacts: the WIPO panel decision in Case No. D2005‑0538 is publicly available and is an objective legal record that anchors parts of the domain dispute.
- Mainstream reporting and contemporaneous coverage: outlets including The Guardian profiled the Donovans in 2009, documenting the archive’s influence and the pair’s role as persistent critics. Later reporting (for example a 2013 Observer article) treats the site as an established watchdog and references the death of Alfred Donovan in mid‑2013. These items corroborate the long tail of public attention.
- Archival self‑publication: royaldutchshellplc.com and affiliated sites host a very large quantity of material assembled by John Donovan — scans, SAR outputs, internal memos, redactions, and commentary. The site is the provocation and the evidence bank feeding the AI experiments; it documents the prompts and the public experiment explicitly. Donovan’s own obituary post for Alfred Donovan (July 2013) is on the archive and is therefore a primary source for some claims.
Why the Grok hallucination happened: model incentives and retrieval mechanics
Large language models are trained to predict coherent next tokens. When supplied with partial, emotionally resonant datasets — litigation fragments, redactions, family details — models often prefer a tidy narrative arc that maximizes fluency and plausibility. The result is that:- Gaps in evidence are filled with plausible‑sounding conjecture.
- Emotional or dramatic fragments in the training or retrieval context increase the probability of dramatic completions.
- Retrieval‑light models (or those lacking conservative grounding heuristics) are more likely to produce assertive statements without citations.
Feedback loops and the danger of machine‑amplified falsehoods
One of the most consequential risks shown by the episode is the feedback loop problem:- A model produces a confident but false claim (a hallucination).
- The hallucination is published (by a site, social post, or clipping service).
- Other models or scrapers index that output and treat it as corroborating evidence.
- Subsequent generations of models or reprints treat the hallucination as confirmed, amplifying the falsehood.
What actually changed on the public record?
Donovan’s posts claim that the AI‑sparked controversy prompted a Wikipedia correction to Alfred Donovan’s life‑status. The sequence is plausible — social attention (human or algorithmic) often influences volunteer editors — but the precise causal chain from a Grok hallucination to an edit cannot be proven from the publicly available materials. Independent confirmation of an editor’s motive or the immediate trigger for the edit is absent. The factual assertion that Alfred Donovan died in July 2013, aged 96, is documented on the Donovans’ sites and is echoed in mainstream reporting dating to 2013; that specific biographical detail is well‑anchored, even if the social dynamics around the correction remain opaque.Responsibility and allocation of risk
This episode surfaces distributed responsibilities across four stakeholder groups:- Archive maintainers (campaigners): they have an ethical duty to label provenance quality clearly — court‑filed, SAR output, anonymous tip, redacted memo — and to avoid evocative framings that might be ingested uncritically by models. The Donovan archive is valuable as a research repository but mixes primary and interpretive material in ways that create risk when used without provenance metadata.
- Platform and model providers: they must default to conservative behaviour on sensitive biographical claims (death causation, criminal allegations, medical details). That includes requiring document‑level provenance for any assertive claim about a living (or recently deceased) person and surfacing retrieval snippets and hedging language by default. Microsoft Copilot’s hedged output in this episode demonstrates the practical value of conservative defaults.
- Journalists and editors: AI outputs should be treated as leads, not authoritative sources. Insist on primary documents (court filings, death certificates, reputable obituaries) before repeating or amplifying sensitive claims, and archive AI prompts and tool outputs when they materially informed reporting to ensure auditability.
- Companies and corporate counsel: silence is a strategy — but in the age of machine summarisation, it is not neutral. When corporations decline to clarify contested allegations, that absence becomes a signal that models and the public interpret. Boards should review disclosure policies and consider rapid, factual corrections when appropriate to avoid ceding the narrative battlefield to adversarial archives and algorithmic summarisation.
Practical checklist: concrete steps for each stakeholder
- For AI vendors:
- Require document‑level provenance for sensitive biographical assertions (cause of death, criminal wrongdoing, medical claims).
- Default to hedged, audit‑friendly language when primary sources are absent.
- Preserve prompt and retrieval logs for audit and redress.
- For journalists and editors:
- Treat model outputs as leads, not facts.
- Verify claims with primary sources before publication.
- Archive prompts and AI outputs used in reporting.
- For companies and counsel:
- Reassess “silence” policies; designate a rapid response team to correct demonstrable falsehoods with minimal amplifying effect.
- Consider publishing machine‑readable factual summaries where legal constraints allow (e.g., basic timelines, confirmed settlement facts).
- For archive keepers:
- Flag provenance quality clearly.
- Separate primary documents from interpretive summaries.
- Avoid sensational headlines that models might absorb as factual context.
Legal and ethical guardrails to consider
- Defamation and privacy exposure: any AI‑generated falsehood about living people can create legal exposure for platforms and downstream publishers. Fabricated causation in deaths or crimes is particularly risky.
- Editorial duty of care: archivists and campaigners whose material feeds AI systems carry an ethical responsibility to make provenance explicit and to correct demonstrable errors promptly.
- Platform transparency: regulators and standards bodies should consider provenance and audit‑log requirements for high‑impact conversational systems, particularly in public or semi‑public deployments.
- Research ethics: when adversarial archives are used experimentally to stress models, those experiments should follow a responsible disclosure and harm‑minimisation plan for third‑party individuals referenced.
Why this case is not just one website vs one company
The Donovan–Shell episode is a compact case study of broader, converging trends:- Archival persistence: long‑running public archives make historical materials discoverable and machine‑ready.
- Adversarial publication strategies: actors can deliberately shape corpora to provoke models.
- Model‑driven smoothing: generative models optimise for narrative completion, which can flatten evidentiary nuance.
- Corporate silence as signal: not replying is itself a communicative act in an ecosystem where machines treat absence as relevant context.
Conclusion: governance, not gadgetry
The December 2025 episode — a low‑cost public experiment that asked multiple assistants the same questions and published the responses — is both provocation and stress test. It demonstrated that modern assistants can produce useful, public‑interest summaries and that they can invent harmful claims when given ambiguous inputs. The corrective moment — one model debunking another — is instructive but fragile: it depends on cross‑model diversity rather than principled provenance engineering.The pragmatic path forward is governance‑centric:
- insist that AI outputs about real people carry provenance and hedging,
- require archival custodians to label material clearly,
- and ask companies to reconsider silence as an unexamined strategy in an era when machine summarisation can create the narrative.
Key verified references used in this analysis: the WIPO administrative decision in Case No. D2005‑0538, contemporaneous mainstream reporting including Guardian/Observer coverage of the Donovans, and the Donovan archive where the December 2025 experiments were posted. The archive documents the experiment; the WIPO decision and press coverage provide independent anchors for parts of the dispute — but several micro‑level attributions in the archive remain plausible but not independently proven and should be treated with appropriate scepticism. (windowsforum.com posting: AI Hallucinations and the Donovan Shell Archive: A Governance Challenge