A sweeping international study coordinated by the European Broadcasting Union (EBU) and led by public broadcasters has found that four leading AI chatbots — ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity — misrepresent news content in roughly 45 percent of tested responses, with serious sourcing and factual errors pervasive across languages and territories.
News consumption is rapidly shifting from traditional web pages to conversational interfaces and AI assistants. The Reuters Institute’s Digital News Report 2025 shows that while weekly use of AI chatbots for news remains modest globally, about 7% of people already use them weekly for news — and that rises to around 15% among those under 25. That generational adoption curve means errors in AI-driven news can reach an increasingly influential cohort. Public-service media organizations worried about this trend pooled their expertise to assess what AI assistants actually deliver when asked common news questions. Twenty-two broadcasters across 18 countries, working in 14 languages, asked identical news prompts to the four assistants and had experienced journalists blind-review more than 3,000 AI responses for accuracy, sourcing, context, and the ability to separate fact from opinion. The results were stark and systemic.
The path forward is straightforward in principle though complex in execution: improve the fidelity of news ingestion and attribution, embed editorial verification into AI-news flows, and implement independent monitoring and regulatory guardrails so that facts in truly means facts out. The findings from this largest-of-its-kind audit make that urgency unmistakable.
Source: vijesti.me https://en.vijesti.me/amp/783609/Large-study-finds-AI-chatbots-distort-news/
Background: why this study matters
News consumption is rapidly shifting from traditional web pages to conversational interfaces and AI assistants. The Reuters Institute’s Digital News Report 2025 shows that while weekly use of AI chatbots for news remains modest globally, about 7% of people already use them weekly for news — and that rises to around 15% among those under 25. That generational adoption curve means errors in AI-driven news can reach an increasingly influential cohort. Public-service media organizations worried about this trend pooled their expertise to assess what AI assistants actually deliver when asked common news questions. Twenty-two broadcasters across 18 countries, working in 14 languages, asked identical news prompts to the four assistants and had experienced journalists blind-review more than 3,000 AI responses for accuracy, sourcing, context, and the ability to separate fact from opinion. The results were stark and systemic. Key findings at a glance
- 45% of AI responses contained at least one significant problem.
- 31% of responses showed major source problems — missing, misleading, or incorrect attributions.
- 20% of responses contained major factual errors, including hallucinations and outdated information.
- Performance varied by assistant, with Google Gemini flagged as the worst performer on sourcing in this round of testing (the study reports a notably higher rate of sourcing problems in Gemini’s responses, though different outlets cite slightly different percentages). This cross-study variance is noted below.
Overview: methodology and scope
Who ran the tests and how
- The study was coordinated by the European Broadcasting Union (EBU) and led by the BBC, with contributions from 22 public-service media organizations including Deutsche Welle (DW), NPR and others across 18 countries.
- Participating journalists posed a set of shared, topical news questions to four major AI assistants and then evaluated the anonymized responses blind to the provider. Review criteria included accuracy, sourcing, context, editorial judgement, and the ability to distinguish fact from opinion. The blinded review by professional journalists reduces bias and simulates a real-world editorial check.
Why the approach is credible
- Unlike single-site or vendor-led tests, this project spanned multiple languages and editorial cultures, applying the same rubric across markets. That breadth is why the EBU and partners describe the failures as systemic rather than isolated. The transparency of methodology (question lists, evaluation taxonomy and example failures) raises the bar for reproducible media-technology research.
What the errors look like — concrete examples
AI assistants did not just make minor phrasing mistakes; they introduced fact-level errors and misattributions with real-world implications.- In the testing carried out by Deutsche Welle, AI answers contained glaring political errors: systems named Olaf Scholz as German chancellor after another politician had taken office, and named Jens Stoltenberg as NATO secretary-general after political changes had occurred. Those mistakes illustrate outdated knowledge and failure to reflect real-time developments.
- The BBC’s February 2025 study earlier found that more than half of AI-generated summaries of BBC content had significant issues, and nearly one-fifth of responses that cited BBC material introduced erroneous facts, altered quotes, or misrepresented dates and numbers. Those earlier findings motivated the broader EBU-led project.
- Across tests, AI assistants sometimes made up URLs, misattributed quotes, or blended opinion and fact without labeling editorializing — problems that can mislead readers who accept conversational answers at face value.
Deep dive: the three primary failure modes
1. Outdated or stale knowledge
Many large language models (LLMs) retain knowledge only up to their training cutoff unless explicitly connected to live data. Even with live web access, models sometimes continue to return stale facts or conflict with the latest authoritative updates. The study repeatedly found that AI assistants lagged on fast-moving political appointments and breaking-event details.2. Hallucinations and invented details
Hallucination — generation of plausible-sounding but false information — remains a core challenge for generative models. The journalists’ blind reviews flagged numerous instances where the assistant invented a quote, date, or fact that could not be corroborated in the referenced sources. These errors are particularly dangerous when the assistant presents the invention confidently.3. Sourcing and attribution failures
The most common large-scale defect in the study was sourcing failure: AI assistants either failed to attribute claims, offered incorrect links or citations, or pointed to secondary/syndicated pages instead of original reporting. The EBU found that nearly a third of responses had serious sourcing problems — a direct threat to transparency and traceability in news distribution.Comparative performance: where each assistant stands
The study and related coverage show variation in performance between systems, but with an important caveat: percentages and rankings differ slightly across publications, and performance fluctuates by question type, language, and timing.- Google Gemini: Reported as the weakest performer in this EBU-coordinated round, with the highest proportion of responses flagged for sourcing issues. Multiple outlets reference Gemini’s significantly higher sourcing-failure rate; the EBU press release and public reporting documented Gemini’s outsized rate of issues. (Different media summaries report Gemini’s failure rate in the low-70s to mid-70s percent range; that variation reflects different rounding and sample subsets and is flagged further below.
- Microsoft Copilot: Performed inconsistently across studies. The BBC’s earlier study also flagged Copilot among the less reliable systems for certain tasks. Performance often depended on whether the assistant declined uncertain questions or produced a confident but incorrect answer.
- ChatGPT (OpenAI): Continued to register hallucinations and occasional outdated knowledge in these news tasks, even though OpenAI has invested in web access and real-time tools. OpenAI has stated its intention to support publishers and attribution, but the independent reviews show gaps remain.
- Perplexity: Generally cited for stronger citation behavior in prior studies, but still showed significant errors across complex editorial judgment tasks. The platform advertises research modes with higher factuality, but independent journalistic review highlights persistent limitations.
Impacts on trust, democracy, and publishers
Jean-Philippe de Tender, the EBU’s deputy director general, has framed the findings as a trust emergency: when people cannot tell what is factual, public trust in information erodes and democratic participation can be harmed. The study’s authors argue that systemic inaccuracies and misattributions create a cascade risk — audiences misinformed by assistants may then spread inaccurate claims on social media or avoid trusted news outlets entirely. Publishers face a double bind. On one hand, AI assistants that summarise and reroute attention can reduce direct traffic, undermining monetization. On the other, when assistants misrepresent a publisher’s work, the publisher’s reputation suffers even if the original reporting was accurate. The BBC’s earlier study documented that AI summaries sometimes altered or invented quotes attributed to the BBC, demonstrating how publisher credit and liability become entangled.Responses from industry, public media, and campaigns
- The EBU and partner organizations have launched a joint campaign called “Facts In: Facts Out”, urging AI companies to ensure faithful handling of news content and to be transparent about sourcing and provenance. The campaign’s demand is succinct: if facts go in, facts must come out.
- Public broadcasters argue regulators should enforce information-integrity laws and that independent, rolling monitoring of AI assistants should be instituted because models and their behaviors change rapidly. The EBU and the participating organizations called for EU and national regulatory action on digital services and media pluralism.
- Tech companies have acknowledged the problems publicly. Earlier statements from OpenAI highlighted efforts to improve attribution and tools for discovering quality content; Google and Microsoft have emphasized iterative improvement and user feedback mechanisms. But the journalist-led audits show that corporate statements have not yet translated into consistently accurate behavior at scale.
Why the problem persists: technical and structural drivers
Several intertwined drivers explain why AI assistants continue to distort news:- Training vs. live reality: Models trained on static corpora can lag behind fast-moving events; live web access can help but introduces new risks (broken links, stale caches, or scraped paywalled content).
- Objective mismatch: Language models are optimized for plausible text generation and helpfulness signals, not for provable factual correctness or strict source fidelity. That means confidence and fluency can mask inaccuracy.
- Citation hallucination: Models sometimes fabricate plausible-looking citations or supply URLs that do not resolve to the claimed source — a behavior documented both in academic audits and journalist reviews.
- Economic incentives and design choices: Assistants are tuned for seamless, single-turn answers to retain users. That product design can deprioritize cautious language, source linking, or refusal in the face of uncertainty.
What newsrooms and users can do now
The EBU study’s toolkit and partner guidance identify practical mitigation steps for both publishers and consumers.- For newsrooms:
- Mark original content clearly with structured metadata to help machine discovery and attribution.
- Publish archival and update summaries so assistants that rely on web content can find authoritative change logs.
- Engage with AI providers through licensing and technical integrations to preserve provenance and traffic.
- For users:
- Treat AI assistant answers as starting points, not definitive reporting.
- Cross-check claims against primary sources, bylines, or official statements.
- Prefer answers that include clear, verifiable citations and avoid sharing unchecked claims.
- For regulators and platforms:
- Require transparency around provenance and give users control or visibility into where the assistant is sourcing news.
- Establish ongoing independent monitoring programs that emulate the EBU approach to detect regressions and systemic failures.
Limitations and verifiability: what to be cautious about
- Reported percentages for specific assistant failures (for example, Gemini’s sourcing-failure rate) vary slightly across press summaries. The EBU press release and several major outlets report Gemini with sourcing issues in the low-70s to mid-70s percentile range; minor discrepancies likely reflect rounding, different sample slices, or later corrections. Where numbers differ across summaries, readers should see them as directionally consistent (i.e., Gemini performed materially worse in sourcing) rather than as exact absolutes. This article flags those variances explicitly to avoid overstating precision.
- The study’s results are time-sensitive. AI assistants receive frequent updates, retraining, and rubric changes that can change behavior; the EBU report therefore recommends rolling, independent monitoring rather than one-off audits to keep track of longitudinal performance. Any snapshot of performance should be read as reflecting model behavior at the time of testing.
- Some vendor claims about accuracy in specific modes (e.g., Perplexity’s “Deep Research” accuracy metrics) rely on internal evaluation criteria that are not directly comparable to journalist blind reviews. Independent audits remain the gold standard for editorial accuracy assessments.
A practical checklist for IT pros, moderators, and editors
- Verify conversational answers before quoting in social feeds or newsletters.
- Instrument link and referrer analytics to understand if AI assistants are scraping and aggregating your pages.
- Expose structured metadata (schema.org, Open Graph with canonical tags) to help reduce misattribution.
- Run small-scale, internal blind-tests of AI assistants on your own content to detect reproducible failure modes.
- Negotiate provenance-friendly licenses with AI providers where appropriate to preserve traffic and editorial control.
The regulatory and industry horizon
Public broadcasters, industry groups and some governments are pressing for more aggressive oversight. The EBU and allied media groups call for regulators to apply existing laws on information integrity, platform responsibility, and media plurality to AI assistants; they also want mandatory transparency for provenance and the right to independent audits. The “Facts In: Facts Out” campaign captures this demand in a simple slogan intended for policy makers and companies alike. At the same time, major AI companies face commercial and technical pressure to improve source fidelity without ruining the conversational experience. Expect continued tension between product convenience and evidence-forward accuracy requirements — a tension that will likely be central to debates over AI governance in the coming 12–24 months.Final analysis: where we stand and what’s at stake
This cross-border, multilingual audit confirms what journalists and technologists have feared: AI assistants are not yet trustworthy replacements for professional news reporting. The problem is not confined to one product or language — it is systemic, and it affects how people discover, interpret, and share the news.- Strengths exposed by the study: AI assistants are fast, conversational, and capable of surfacing relevant threads across sources; they can help scale summarization and accessibility functions when used carefully under editorial supervision.
- Persistent risks: hallucinations, outdated facts, and sourcing errors remain common. Those failures are not benign because they mislead users and can amplify disinformation, especially among younger cohorts who are more likely to adopt chat-based interfaces as primary news gateways.
- Pragmatic conclusion: For organizations and users alike, the sensible path is a hybrid approach — use AI for efficiency and discovery, but keep a human editorial checkpoint for accuracy, sourcing, and context before republishing or amplifying AI-generated summaries as fact. Independent, ongoing monitoring must become standard practice to detect regressions as models and connectors are updated.
The path forward is straightforward in principle though complex in execution: improve the fidelity of news ingestion and attribution, embed editorial verification into AI-news flows, and implement independent monitoring and regulatory guardrails so that facts in truly means facts out. The findings from this largest-of-its-kind audit make that urgency unmistakable.
Source: vijesti.me https://en.vijesti.me/amp/783609/Large-study-finds-AI-chatbots-distort-news/