Over the course of a single news cycle in late December 2025 a decades‑long adversarial archive and one of the world’s largest energy companies were reframed not by a fresh filing or press release, but by public experiments with generative AI assistants — and the episode exposes a practical governance gap at the intersection of archival practice, corporate communications, and model design.
John Donovan’s campaign against Royal Dutch Shell began as commercial litigation in the 1990s and evolved into a curated, searchable set of archival websites (most prominently royaldutchshellplc.com) that aggregate court filings, Subject Access Request (SAR) outputs, internal memoranda, scanned documents and interpretive commentary. That archive has intermittently seeded mainstream reporting and has itself been the subject of legal fights — most visibly a 2005 UDRP challenge by Shell that was denied by a WIPO administrative panel in Case No. D2005‑0538. The late‑December 2025 provocation was methodical rather than accidental. Donovan deliberately fed portions of the archive and a consistent prompt set into multiple public AI assistants — publicly identified in his posts as Grok (xAI), Microsoft Copilot, ChatGPT and Google AI Mode — and then published the divergent outputs as evidence of how models handle contested archival material. The resulting cross‑model disagreement, where one assistant produced a fabricated causal claim about a human death and another corrected it, turned a private quarrel into a governance stress test.
Key technical drivers:
The incident underlines a practical truth: the governance problem here is organizational, not purely technical. Fixes like provenance attachments, hedging defaults, and audit logs are implementable; the real work is embedding them into editorial workflows, corporate policy and regulatory standards.
Source: Royal Dutch Shell Plc .com windowsforum.com posting: AI Hallucinations and the Donovan Shell Archive: A Governance Challenge
Background
John Donovan’s campaign against Royal Dutch Shell began as commercial litigation in the 1990s and evolved into a curated, searchable set of archival websites (most prominently royaldutchshellplc.com) that aggregate court filings, Subject Access Request (SAR) outputs, internal memoranda, scanned documents and interpretive commentary. That archive has intermittently seeded mainstream reporting and has itself been the subject of legal fights — most visibly a 2005 UDRP challenge by Shell that was denied by a WIPO administrative panel in Case No. D2005‑0538. The late‑December 2025 provocation was methodical rather than accidental. Donovan deliberately fed portions of the archive and a consistent prompt set into multiple public AI assistants — publicly identified in his posts as Grok (xAI), Microsoft Copilot, ChatGPT and Google AI Mode — and then published the divergent outputs as evidence of how models handle contested archival material. The resulting cross‑model disagreement, where one assistant produced a fabricated causal claim about a human death and another corrected it, turned a private quarrel into a governance stress test.What changed: the December experiment
- Donovan published two staged posts on December 26, 2025 — a rhetorical piece (“Shell vs. The Bots”) and a satirical roleplay (“ShellBot Briefing 404”) — explicitly designed to make archival material machine‑friendly and to force cross‑model comparison.
- The same dossier was submitted to multiple public assistants. Outputs diverged in characteristic ways: one assistant produced a dramatic but unsupported causal claim about Alfred Donovan’s death; another assistant corrected that claim and cited the obituary record; a third framed the incident as a meta‑level observation about the social process of archival amplification.
Overview: the assistants’ behaviours and what they reveal
Grok: narrative-first, low provenance sensitivity
One assistant (publicly attributed to Grok) produced a fluent, emotionally resonant mini‑biography that included the invented causal line that Alfred Donovan “died from the stresses of the feud.” That line conflicts with the Donovans’ own obituary material (Alfred Donovan recorded as dying in July 2013 after a short illness) and is a classic example of hallucination — a model prioritising narrative coherence over documented evidence. The consequence is not merely embarrassment: a machine-generated causal claim about a real person can cause reputational harm and legal exposure if repeated.ChatGPT: corrective counter‑narrative
When the same dossier was presented to ChatGPT, the assistant challenged the invented cause‑of‑death line and pointed to the documented obituary. That public contradiction created a visible remediation dynamic: one model invented, another debunked. This shows that model diversity can surface errors quickly, but it is brittle as a governance strategy — it works when multiple models are consulted and when at least one is tuned for conservative grounding.Microsoft Copilot: hedged synthesis and auditability
Microsoft Copilot’s output — as published in Donovan’s transcripts — adopted explicit hedging language (phrases like “unverified narrative”) and structured summarisation. The conservative posture is notable because it produces more audit‑ready prose while preserving usability. Copilot’s behaviour highlights the practical value of default hedging and explicit uncertainty for contested archives.Google AI Mode: meta‑analysis and process framing
Google AI Mode reportedly responded by stepping back and describing the pattern: that Donovan intentionally framed a cross‑model experiment and that the resulting outputs conflicted. This meta framing treats institutional silence as signal and the social mechanics of agents as the primary story, an approach that reduces the risk of amplifying fragile factual claims.Evidence base: what is verifiable and what is not
Three classes of material must be separated when assessing the episode:- Documented legal records and contemporaneous reporting: the WIPO UDRP decision in Case No. D2005‑0538 is public and dispositive on the 2005 domain dispute; mainstream outlets such as The Guardian profiled the Donovan campaign in 2009 and corroborate the archive’s media impact.
- Archival self‑publication: royaldutchshellplc.com hosts scans, court filings, SAR outputs and the December 26, 2025 posts that narrated and documented the AI experiments. These pages are the provocation and should be treated as primary evidence of Donovan’s intent and methodology — but not as independent confirmation of every claim within.
- Unattributed or redacted material: anonymous tips and redactions appearing in the archive often lack chain‑of‑custody metadata and therefore require independent corroboration before being treated as fact. Generative models, absent provenance metadata, will typically smooth such ambiguity into a readable narrative — and that is the root of the hallucination risk.
Why the Grok hallucination happened (technical mechanics)
Large language models are trained to optimise for probable next tokens conditioned on input. When fed partial archives that mix emotionally salient fragments with legal documents, a coherence‑driven optimiser has a high incentive to produce a tidy, human‑like arc. Absent explicit provenance attachments, conservative heuristics, or disallowed assertions for sensitive categories, models will fill gaps with plausible — but potentially false — details.Key technical drivers:
- Retrieval ambiguity: an archive that mixes primary records with commentary provides noisy retrieval signals; retrieval‑light models will not reliably distinguish provenance quality.
- Coherence objective: many conversational models favour fluent story completion, which increases the chance of inventing causal links that are rhetorically satisfying.
- Lack of mandatory hedging for sensitive claims: without enforced conservative defaults for claims about causes of death, criminality, or health, models risk asserting specifics that should instead be framed as unverified.
The systemic risks exposed
- Feedback loops and information laundering: machine outputs that appear authoritative get scraped and re‑indexed, becoming part of other systems’ training or retrieval sets. A single hallucination can cascade into a distributed falsehood unless provenance is preserved end‑to‑end.
- Reputational and legal exposure: fabricated causal assertions about deaths or crimes increase defamation and privacy risk for both the platform provider and downstream amplifiers. The risk multiplies when outputs are republished without human verification.
- Corporate silence as signal: decades of Shell’s legal containment and limited public responses created a vacuum that adversarial archives sought to fill. In the age of machine summarisation, silence is not neutral — it becomes contextual fuel for models that treat absence as a gap to be filled. That dynamic transforms non‑response into a reputational liability.
- Archive maintenance responsibility: custodians of adversarial archives bear an ethical duty to flag provenance and quality of evidence. Publishing ambiguous materials without metadata makes them easy to weaponise by any actor willing to feed adversarial prompts to public assistants.
Who bears responsibility?
- Archive maintainers and campaigners: must label documents by provenance quality (court‑filed, SAR‑derived, anonymous tip) and avoid speculative framing in headlines that models can ingest as fact.
- Platform and model providers: should default to conservative assertions for sensitive, identifiable subjects — requiring document‑level provenance or a clear “unverified” flag before making causal claims about real people. Preserve prompt and retrieval logs for audit and redress.
- Journalists and researchers: treat model outputs as leads, not as authoritative summaries. Insist on primary documents (court filings, death notices, reputable obituaries) before amplifying serious claims. Archive prompts and assistant outputs used in reporting to preserve an evidentiary chain.
- Corporate counsel and boards: re‑evaluate “silence” policies. Silence can cede the narrative battlefield to adversarial archives and algorithmic narration; maintain rapid response workflows that correct proven falsehoods without inadvertently amplifying them.
Practical recommendations (a checklist)
For AI vendors:- Require document‑level provenance metadata for assertions about living persons involving causes of death, crimes, or medical claims.
- Default to explicit hedging (e.g., “reports suggest”, “unverified”) when provenance is absent or weak.
- Surface retrieval snippets and logs inline, and preserve prompts/retrieval contexts for at least 90 days for auditability.
- Treat assistant outputs as leads, never final copy.
- Demand primary evidence before repeating sensitive claims (death certificates, court filings, authoritative obituaries).
- Archive the prompt/output pairs used during research for traceability.
- Reassess the trade‑offs of silence; silence can be interpreted as evidence in the algorithmic age.
- Prepare a rapid, measured correction protocol that reduces additional amplification risk.
- Consider constructive engagement where appropriate to expose provenance gaps in adversarial archives.
- Mark every item with provenance metadata: source type, chain‑of‑custody confidence, redaction notes.
- Avoid sensational headlines that models will treat as factual framing.
- Publish audit trails for SAR‑derived documents (where legally permissible) to reduce misinterpretation risk.
Legal and ethical guardrails
- Defamation exposure is real: fabricated causal links about individuals (living or recently deceased) can create legal liability for platform operators and republishers. Conservative defaults reduce downstream legal risk.
- Editorial duty of care: archives that feed AI‑driven reportage share responsibility for general audience outcomes; proactive correction of demonstrable falsehoods is an ethical requirement.
- Regulatory traction: policymakers and standards bodies should consider provenance and audit‑trail requirements for high‑impact conversational systems, including thresholds for when a system must refuse to assert unsupported sensitive claims.
Why this episode matters beyond the Donovan–Shell quarrel
The Donovan–Shell case is a compact, high‑visibility demonstration of larger structural trends: archival persistence, adversarial use of the public record, model‑driven narrative smoothing, and the reputational consequences of institutional silence. Any archive that mixes primary documents with commentary — whether activist, journalistic, or corporate — can be converted into a machine‑ready narrative. Machines will prefer coherence; humans must insist on provenance.The incident underlines a practical truth: the governance problem here is organizational, not purely technical. Fixes like provenance attachments, hedging defaults, and audit logs are implementable; the real work is embedding them into editorial workflows, corporate policy and regulatory standards.
Conclusion: governance, not gadgetry
The late‑December 2025 episode was a low‑cost experiment that served as a stress test: it proved that modern assistants can summarise contested archives and that they can invent harmful claims when fed ambiguous inputs. The corrective moment — one assistant debunking another — is instructive but fragile because it relies on cross‑model diversity rather than principled provenance engineering. The pragmatic path forward is governance‑centric:- Require provenance for high‑impact claims.
- Default to conservative language when provenance is weak.
- Preserve prompts and retrieval logs for transparent audit and redress.
- Ask archives to label evidence quality and revise sensational framing.
Source: Royal Dutch Shell Plc .com windowsforum.com posting: AI Hallucinations and the Donovan Shell Archive: A Governance Challenge