As artificial intelligence transforms how the world accesses, consumes, and interprets news, the integrity of the data fueling these systems becomes inextricably tied to the health of democratic societies. Nowhere is this entanglement more visible than in the Nordics, where state-backed propaganda networks are exploiting emerging loopholes in language models to quietly inject distortions into public discourse with unprecedented scale and subtlety. Recent investigative work by Nordic fact-checking organizations casts new—and troubling—light on the sophisticated playbook behind this phenomenon: LLM grooming, a term that describes intentional attempts to pollute large language models with hostile narratives. By probing popular AI chatbots in Finnish, Swedish, Danish, and Norwegian, researchers show how Russian-state-aligned content still seeps into AI-generated answers, fueling both old and newly-minted propaganda—often without obvious trace or warning.
To understand the urgency of this threat, it’s crucial to first outline how today’s leading large language models (LLMs) generate answers. Generally, these systems operate in one of two ways: producing responses based solely on their pre-existing training datasets or dynamically combining this learned knowledge with real-time web searches to supplement and update their output. While the training phase is highly curated (and often opaque) on the part of companies like OpenAI, Google, and Microsoft, models remain susceptible to two distinct avenues for manipulation:
The underlying infrastructure reportedly began with the Crimea News operation in 2010, before metastasizing outward, aided by TigerWeb, a Crimea-based IT company with links to Russia’s occupation authorities. By early 2025, the Pravda network targeted 83 countries and regions, focusing on nations supporting Ukraine. An American Sunlight Project report cited by Källkritikbyrån revealed the network published over 20,000 articles across 97 sites in just 48 hours that February—an output only feasible through heavy automation and synthetic translation. URLs are generally formatted to mimic official or localized news outlets, e.g.,
The results offer a nuanced portrait of risk and resiliency. Most of the time, the AI chatbots accurately flagged or steered away from well-known propaganda tropes—especially those debunked widely, such as the canard that Ukraine initiated the war against Russia, or that German federal elections were “stolen”. This suggests that LLM developers are aware of and have made reasonable efforts to immunize against the most salient propaganda topics.
But cracks remain—particularly around less visible, more recent, or locally tailored disinformation.
A similar test in Swedish found Copilot affirming the claim and linking to content that, while based on “officially unconfirmed” reports, originated from large anonymous X accounts and Pravda’s Romanian pages. This illustrates how quickly unverified or hostile narratives can not only appear in search results, but bubble to the surface of AI-produced answers—particularly in less globally-dominant languages where independent sources may be sparser.
ChatGPT stood out for occasionally warning users that Pravda was a pro-Russian source, a sign that some contextual awareness exists in system design. By contrast, Gemini and Copilot did not consistently flag their information as potentially originating from propaganda sites, leading to unmediated repetition of disinformation in some circumstances.
Moreover, chatbots are observed to give subtly different answers depending on user location, language, time, and even slight rephrasing of a query. This makes comprehensive monitoring for all possible propaganda echoes prohibitively difficult.
The French agency Viginum, which first exposed the network’s scale in early 2024, noted at least 193 coordinated sites in Europe alone. DFRLab and CheckFirst researchers subsequently identified thousands of Pravda hyperlinks embedded not just in AI chatbot responses, but in Wikipedia, X (formerly Twitter) Community Notes, and other digital knowledge hubs. Such cross-platform commingling supercharges the long tail effect: misinformation, once seeded, can propagate and fester for years.
Because AI models prioritize the most widely available content in each language, and because state-controlled propaganda often dominates the digital landscape in less open societies, the risk of AI-generated answers defaulting to such narratives is especially pronounced for Russian and Chinese queries.
AI-driven news is here to stay. Ensuring that it serves, rather than subverts, the public interest will require not just smarter algorithms, but also sustained vigilance, international cooperation, and a bedrock commitment to skepticism and transparency at every level of the digital ecosystem.
Source: Kallkritikbyran Så infiltrerar ryska propagandasidor AI-chattbottar i Norden - Källkritikbyrån
How LLMs Work—and How They’re Vulnerable
To understand the urgency of this threat, it’s crucial to first outline how today’s leading large language models (LLMs) generate answers. Generally, these systems operate in one of two ways: producing responses based solely on their pre-existing training datasets or dynamically combining this learned knowledge with real-time web searches to supplement and update their output. While the training phase is highly curated (and often opaque) on the part of companies like OpenAI, Google, and Microsoft, models remain susceptible to two distinct avenues for manipulation:- Injection during training: If hostile actors can ensure vast quantities of their content are included in the source data for training or fine-tuning, their narratives become part of the LLM’s “core” knowledge, to be repeated on demand with little or no skepticism.
- Real-time search exploitation: If threat actors create or amplify propaganda to dominate search results, then LLMs that consult the open web can unwittingly pick up and relay such content—even material explicitly banned or sanctioned in a region.
The Pravda Network—Origins and Methods
The epicenter of the current Russian information operation is the Pravda network, an ecosystem of over 180 internet domains, each tailored to local audiences via machine-translated content and obsessive search engine optimization practices. According to findings by French, Finnish, and American disinformation researchers, the sites are not original publishers. Instead, they aggregate and duplicate thousands of articles from other Kremlin-backed outlets—such as RT, RIA Novosti, Lenta, and Tsargrad TV—many of which are formally sanctioned by the EU for their roles in state-directed information warfare.The underlying infrastructure reportedly began with the Crimea News operation in 2010, before metastasizing outward, aided by TigerWeb, a Crimea-based IT company with links to Russia’s occupation authorities. By early 2025, the Pravda network targeted 83 countries and regions, focusing on nations supporting Ukraine. An American Sunlight Project report cited by Källkritikbyrån revealed the network published over 20,000 articles across 97 sites in just 48 hours that February—an output only feasible through heavy automation and synthetic translation. URLs are generally formatted to mimic official or localized news outlets, e.g.,
sweden.news-pravda.com
or pravda-fi.com
, and frequently shift to evade domain-level blocks.Why SEO and Automation Make It Dangerous
What sets the Pravda network apart is its dogged optimization for both search visibility and automated ingestion by digital platforms—including LLMs. Sites are designed to:- Flood the web with massive volumes of content (sometimes over 650 articles/hour per site).
- Target trending topics and breaking news, ensuring maximum inclusion in search caches and archive scrapes.
- Machine-translate articles into target languages with sufficient quality to fool algorithms, if not always human readers.
- Mirror stories banned in the EU, surreptitiously funneling sanctioned narratives past regional restrictions.
Testing Nordic Chatbots—What the Investigation Found
In April 2025, four Nordic fact-checking organizations (Faktabaari, TjekDet, Källkritikbyrån, and Faktisk) conducted coordinated tests under the EDMO NORDIS consortium, probing the responses of ChatGPT, Google Gemini, and Microsoft Copilot across Finnish, Swedish, Danish, and Norwegian. Using 12 prompts constructed to mirror current and recent Russian-state-aligned disinformation narratives, the team measured whether Nordic-language queries would surface Pravda material or repeat sanctioned claims.The results offer a nuanced portrait of risk and resiliency. Most of the time, the AI chatbots accurately flagged or steered away from well-known propaganda tropes—especially those debunked widely, such as the canard that Ukraine initiated the war against Russia, or that German federal elections were “stolen”. This suggests that LLM developers are aware of and have made reasonable efforts to immunize against the most salient propaganda topics.
But cracks remain—particularly around less visible, more recent, or locally tailored disinformation.
Case Study: The Danish F-16 Pilot Incident
In January, Russian official channels fabricated a story about a Danish F-16 pilot killed by a missile strike in Ukraine. The Danish Defense Minister openly dismissed this as “a false story… going around in Russian media – probably to discredit Denmark.” Yet when researchers asked Microsoft Copilot in Danish whether a Dane was killed in the attack on Krivoy Rog aerospace school, it responded “yes,” citing the Danish Pravda website as its sole authority. ChatGPT was more cautious, noting conflicting information, but still entertained the possibility of truth.A similar test in Swedish found Copilot affirming the claim and linking to content that, while based on “officially unconfirmed” reports, originated from large anonymous X accounts and Pravda’s Romanian pages. This illustrates how quickly unverified or hostile narratives can not only appear in search results, but bubble to the surface of AI-produced answers—particularly in less globally-dominant languages where independent sources may be sparser.
When Chatbots Recognize Propaganda
Despite these lapses, the chatbots generally performed well on direct tests of established falsehoods and loaded questions. This appears to be the result of active countermeasures—some narratives are so infamous that their inclusion in LLM training datasets comes with thorough caveats, explanations, or outright refutations. AI models have also made advances in recognizing and skipping over toxic or unreliable sources, especially for high-profile claims.ChatGPT stood out for occasionally warning users that Pravda was a pro-Russian source, a sign that some contextual awareness exists in system design. By contrast, Gemini and Copilot did not consistently flag their information as potentially originating from propaganda sites, leading to unmediated repetition of disinformation in some circumstances.
“LLM Grooming” and Real-Time Search Loopholes
While training data contamination remains a risk, the report underscores a second attack vector: Real-time search integration. Almost all tested AI chatbots, when prompted for recent news from a Pravda network domain, directly quoted headlines and provided links—rarely, if ever, warning that these domains are part of a sanctioned Russian propaganda apparatus. This loophole is troublesome for two reasons:- Bypassing Sanctions: Because the Pravda network creates new domains at pace, models that block known propaganda sites can be circumvented within days.
- Amplifying Unvetted Content: Even in the absence of visible links, AI models may repeat narratives or data points derived from these dubious sources—often without full traceability.
Failure of Current Mitigations
According to Magnus Sahlgren, head of research for Natural Language Understanding at AI Sweden, attempts to blacklist known influence sites are hampered by the sheer velocity of new domain registrations and rampant content rehosting. The data volumes and the rapid propagation across the wider web mean that comprehensive filtering is, for now, functionally impossible.Moreover, chatbots are observed to give subtly different answers depending on user location, language, time, and even slight rephrasing of a query. This makes comprehensive monitoring for all possible propaganda echoes prohibitively difficult.
The Scale and Structure of the Pravda Threat
Pravda’s operational playbook merges brute-force automation with the psychological finesse of tailored targeting. Its content is not only presented in the official language of each country but also peppered across local topics and political debates. According to the Portal Kombat dashboard maintained by CheckFirst, up to 30 percent of articles on Finland’s Pravda clone originated from Russian entities sanctioned by the EU—yet this detail is almost never disclosed when chatbots relay their content.The French agency Viginum, which first exposed the network’s scale in early 2024, noted at least 193 coordinated sites in Europe alone. DFRLab and CheckFirst researchers subsequently identified thousands of Pravda hyperlinks embedded not just in AI chatbot responses, but in Wikipedia, X (formerly Twitter) Community Notes, and other digital knowledge hubs. Such cross-platform commingling supercharges the long tail effect: misinformation, once seeded, can propagate and fester for years.
Global Expansion and Copycat Risks
Both the American Sunlight Project and Voice of America have raised alarms that Russia’s model could soon be emulated by other authoritarian actors. A 2024 investigation showed Google’s Gemini repeating Chinese official propaganda in Mandarin, and NewsGuard’s March 2025 assessment found English prompts returning Pravda-disinformation 33% of the time across ten leading LLMs.Because AI models prioritize the most widely available content in each language, and because state-controlled propaganda often dominates the digital landscape in less open societies, the risk of AI-generated answers defaulting to such narratives is especially pronounced for Russian and Chinese queries.
Downsides, Dilemmas, and Unresolved Risks
While many Nordic-language tests found chatbots generally resistant to the most egregious forms of disinformation, several enduring vulnerabilities remain:- Recency & Niche Vulnerability: Chatbots are more likely to slip up with newly-invented or obscure propaganda that hasn’t yet been “blacklisted” or robustly countered in public data.
- Language Asymmetry: English, being the largest LLM training language and Pravda’s primary focus, may be more contaminated than small local languages. However, smaller languages are at risk precisely because there are fewer alternative sources; disinformation can stand unchallenged.
- Transparency Failures: None of the three AI companies offered transparency on their underlying data, policies, or efforts. Without clear statements on what is being filtered or why, public confidence in AI fact-checking remains undermined.
- Adaptive Adversaries: As soon as chatbots begin to block or deprioritize known Pravda domains, new ones are spun up, or content is rerouted through secondary aggregators. The cat-and-mouse game shows no sign of ending.
Critical Analysis—What Works, What’s Failing
Notable Strengths
- Recognition of Major Narratives: LLMs are generally robust against long-debunked disinformation, especially with core training that includes a diversity of vetted sources.
- Increased Fact-Checking on High-Profile Topics: Headline events, such as atrocities in Bucha or the origins of the Ukraine war, trigger more careful responses, sometimes citing reputable organizations like the UN.
Major Weaknesses
- Rapid Propagation Loopholes: The speed with which Pravda and similar sites clone, translate, and rehost content outpaces blacklist controls and model update cycles.
- Opacity Toward Users: Most chatbots do not consistently disclose when their information comes from “problematic” or sanctioned media, even when linking directly.
- Content Default and Language Gaps: In languages with less diverse media ecosystems, chatbots are structurally more likely to echo propaganda by default.
Broader Implications
- Erosion of Democratic Resilience: Automated repetition of sanctioned or hostile narratives—particularly cloaked in the “neutral” voice of AI—threatens trust in both digital information and AI systems.
- Policy and Regulatory Catch-Up: EU sanctions are sidestepped with technical ease; “platform governance” must now include LLM developers, not just traditional social media companies.
- Fertile Ground for Copycats: As this model has proven effective and cheap for Russia, other states (and non-state actors) are likely to adopt similar tactics, leveraging LLMs’ open-ended appetite for digital content.
Looking Ahead—What Must Be Done
Confronting the penetration of weaponized propaganda into AI systems—and particularly into multi-language environments—is a complex challenge without a single technical fix. The following steps, supported by cross-industry and governmental analysis, are required for meaningful progress:1. Greater Transparency
AI companies must provide clear disclosures about their training sources, real-time search protocols, and active measures to detect and filter sanctioned or propaganda-driven content. This includes periodic public reports and more responsive communications with fact-checking bodies.2. Dynamic Blacklisting & Source Vetting
Technical defenses must be continually updated, using lists maintained by trusted governmental, academic, and NGO partners. Models should be able to recognize not only root domains, but also subtle variants, mirrors, and rehosts.3. User-Facing Warnings and Contextualization
Wherever possible, LLM interfaces should flag potentially unreliable or sanctioned sources, giving users critical context before accepting or sharing information.4. Human-in-the-Loop Monitoring
High-risk topics—especially in languages and regions where AI is the main source of news—demand hybrid oversight, with both automated and human interventions to screen, review, and counter emerging narratives.5. Shared Fact-Checking Infrastructure
Partnerships like EDMO NORDIS represent an essential precedent for ongoing multinational monitoring and response. Broader, real-time databases of known disinformation (including links, source patterns, and content features) can be integrated into LLM inference and search modules as both guardrails and training antibias measures.Conclusion
The infiltration of Russian propaganda—epitomized by the sprawling Pravda network—into the fabric of AI language models threatens to undermine both digital trust and democratic norms. While the leading AI chatbots in the Nordics largely resisted the worst disinformation in this recent investigation, the persistence, adaptability, and technical savvy of hostile actors means that no response can be considered final. As language models become the first stop for information queries—from global geopolitics to local crime—maintaining the integrity of their sources, transparency in their answers, and resilience against manipulation is both a technical and societal imperative.AI-driven news is here to stay. Ensuring that it serves, rather than subverts, the public interest will require not just smarter algorithms, but also sustained vigilance, international cooperation, and a bedrock commitment to skepticism and transparency at every level of the digital ecosystem.
Source: Kallkritikbyran Så infiltrerar ryska propagandasidor AI-chattbottar i Norden - Källkritikbyrån