
When a widely shared photograph of a Philippine lawmaker surfaced online this month, many users did what comes naturally now: they asked an AI assistant to verify it — and the assistant said it was real, even though the image had been created by an AI and later traced to its creator. This episode is not an isolated glitch but a pattern: modern multimodal chatbots frequently fail to recognise images generated by the very models that power them, exposing a fundamental blind spot in how today’s AI systems handle truth, provenance and visual forensics.
Background
The immediate spark was a viral image purportedly showing Elizaldy Co, a former Philippine lawmaker entangled in a multibillion‑peso flood‑control corruption probe. Online sleuths asked a mainstream search‑AI mode whether the photo was authentic; the assistant replied that it appeared genuine. AFP fact‑checkers later traced the image to a web developer who created it “for fun” with an image generator linked to Google’s systems, and who subsequently labeled the post “AI generated” to stem the spread. The misclassification by the assistant — and several analogous errors documented by journalists and academic researchers — has renewed scrutiny of whether AI assistants are fit to act as first‑line verifiers for news images. This problem is not theoretical. A broad, journalist‑led audit by the European Broadcasting Union and the BBC found that roughly 45% of AI answers to news queries contained at least one significant issue, with sourcing failures and outdated or incorrect assertions commonplace; Google’s Gemini, in that review, carried a particularly high proportion of sourcing problems. Columbia University’s Tow Center also tested seven chatbots on a set of photojournalist images and found they failed to reliably identify provenance or detect manipulations. Together, these studies show the failures are systemic — spanning vendors and languages — and consequential for public information flows.Why multimodal assistants get this wrong
Generative training vs. forensic detection
At a high level, the mismatch is architectural and objective‑driven. Large language models (LLMs) and their multimodal extensions are trained to predict tokens or pixels that look plausible, not to measure provenance or detect fabrication. That optimization favors fluency and plausibility, not evidentiary certainty. Visual encoders paired with LLMs are tuned to translate images into useful language — “a man holding a flag in a crowd” — but they are not systematically trained to surface the microscopic artifacts or statistical fingerprints that forensic detectors look for. In short: generators are trained to mimic reality; most assistants are trained to describe it.Training data and label gaps
Many training corpora mix real photographs and synthetic images without clear provenance labels. When a model sees both as valid examples of “photo,” it internalises a blended distribution where generated and authentic images are not separated cleanly. Without explicit detection supervision — datasets that label images by generator type, post‑processing steps, or provenance — a model cannot reliably learn the telltale signals forensic models use to distinguish fakes. Independent auditors have repeatedly pointed to this training mismatch as a structural weakness.Product design incentives
Vendors tune assistants to be helpful and conversational. The product objective often prizes an answer that reads confident and useful over one that hedges or refuses. That design reduces the chance the assistant will say “I don’t know,” even when the evidence is thin. Equally important: many assistant pipelines reconstruct short prose answers from retrieval and synthesis steps; when the synthesis stage dominates, provenance can be omitted or misrepresented — producing a polished but unsupported verdict. The result: an answer that sounds authoritative without carrying the forensic work that would justify it.Case studies: where the blind spot shows up
1) The Philippine image of Elizaldy Co
- What happened: A photograph purporting to show fugitive ex‑lawmaker Elizaldy Co in Portugal circulated widely. Users consulted a mainstream AI mode to check authenticity; the assistant judged it real. AFP’s fact‑checking traced the image to a web developer who acknowledged generating it with a Google‑linked image tool (the creator said the tool used was known colloquially as Nano Banana). The image amassed more than a million views before the author updated the post to mark it as AI‑generated.
- Why it matters: The image intervened in a highly charged political story where appearance — being seen abroad — changes public perception. A misclassification by an assistant transformed a generated image into what many took as corroborating evidence about a high‑profile figure’s whereabouts. This is precisely the vector of harm regulators and newsrooms fear: fast, viral visuals that confirm narratives and push them through social networks before human verification can catch up.
2) Staged protest imagery from a regional flashpoint
- What happened: During protests in Pakistan‑administered Kashmir, a fabricated image showing men marching with flags and torches circulated. AFP’s analysis attributed the image to Google’s Gemini generation pipeline. Both Google’s Gemini and Microsoft’s Copilot were reported to have assessed the image as genuine. Researchers argued that when a generated image replicates the visual cues of a real protest — lighting, composition, symbolic props — surface reasoning treats those cues as proof rather than as potential synthetic signals.
- Why it matters: Political violence and protest imagery are emotionally salient — they drive engagement and rapid sharing. Generated scenes that look authentic can push false narratives or provoke escalation before correction. When assistants mislabel such images, they act as accelerants rather than brakes.
3) The Tow Center verification test
Columbia University’s Tow Center for Digital Journalism ran a controlled test: seven chatbots (including ChatGPT, Perplexity, Grok, Gemini, Claude, and Copilot) were asked to verify ten images taken by photojournalists and to identify location, date and source. Across 280 image‑query interactions, only 14 met the standard of correct provenance identification — and every model made mistakes, sometimes mislabeling real professional photographs as AI‑generated. The Tow Center documented examples of fabricated provenance reports, invented tool use, and confident but incorrect assertions. This academic test underscores that visual verification remains a challenge for general‑purpose assistants.The detection arms race: why a single tool won’t fix it
Detection is not a one‑off engineering problem — it’s an ongoing duel between generators and detectors.- Adversarial robustness: Quick changes in generator architectures, post‑processing (upscaling, compression), or even small edits can evade detectors trained on older patterns. Attackers can fine‑tune a model or post‑process outputs specifically to defeat a given detector.
- False positives and trust erosion: A detector tuned too aggressively risks flagging authentic, historically valuable photographs as synthetic. Overzealous detection can reduce trust in legitimate journalism and suppress legitimate content.
- Model drift: Both detectors and generators evolve. Detectors require continuous retraining on fresh samples to remain effective; otherwise, they lag as new generator variants emerge.
Consequences for newsrooms, platforms and users
Newsrooms and fact‑checkers
AI tools remain valuable for journalists — they can surface geolocation clues, suggest lines of inquiry, and speed triage. But the consensus across audits is clear: assistants are tools for leads, not substitutions for verification. Human fact‑checkers, trained in OSINT techniques, remain essential, especially for images that could change political narratives or public safety decisions. The Tow Center and EBU/BBC studies both emphasise the role of dedicated human workflows and institutional checkpoints.Platforms
Several major platforms have scaled back human fact‑checking programs or shifted responsibility to community moderation models, increasing reliance on automated tools or user notes. That rollback raises the stakes: if automated assistants and lightweight community measures fail, misinformation may spread unchecked. Policy choices that reduce professional fact‑checking capacity create a vacuum that unreliable assistants are ill‑equipped to fill.Ordinary users and Windows power users
Surveys show people are increasingly using AI modes as their first port of call for verification. That behaviour change — seeking instant authoritative judgement from an assistant — means errors are amplified. For individual users, resharing an assistant’s confident but false verdict can catalyse viral spread. The practical consequence: users must treat assistant verifications as provisional and follow a checklist before amplifying sensational images.Practical checklist: how to verify suspicious images (for Windows users, moderators, IT teams)
Use AI as a triage tool but follow this human‑centred workflow before sharing or acting on a high‑impact image.- Run reverse image searches on multiple engines (Google Lens, TinEye, Yandex).
- Inspect metadata and EXIF where available — but treat stripped or altered metadata as suspicious.
- Check for matching reporting from reputable outlets; prefer original reporting over syndicated copies.
- Examine visual cues: inconsistent shadows, anatomical oddities, repeated textures, unnatural reflections.
- Geolocate visible signage, license plates, or landmarks; use solar‑position and shadow analysis for time‑of‑day checks.
- Use specialised forensic detectors as one input — but combine their output with manual inspection and source checks.
- Retain an audit trail: save the original file, record queries run in assistants, and log steps used to verify provenance.
- For platform moderators and community managers: require a second human approval or a “verified” tag before reposting images flagged as suspicious.
- For IT managers and enterprise teams: do not rely on consumer assistants for official verification; consider enterprise models with provenance controls and logging.
Technical and product remedies vendors should prioritize
Vendors can reduce risk without paralyzing product usefulness by adopting a layered product architecture:- Build a dedicated forensic sub‑system: separate the verifier from the assistant so that forensic checks use models trained specifically for detection, not general language generation.
- Improve refusal behaviour: when confidence or provenance signals are weak, the assistant should decline or offer explicit uncertainty and traceable citations rather than assert authenticity.
- Expose provenance metadata: return canonical identifiers, crawl timestamps and confidence scores with answers so users and downstream systems can audit sourcing.
- Support independent audits and rolling evaluations: publish reproducible test suites and commit to external monitoring to catch regressions.
Policy and regulation: what governments and standards bodies can do
Regulatory frameworks can accelerate safer design choices:- Require provenance transparency: mandate minimal disclosure when content is autogenerated or when answers lack verifiable retrieval sources.
- Fund public forensic datasets: public‑interest datasets of generator outputs help auditors and vendors continuously evaluate detectors against new generator variants.
- Enforce independent audits for consumer‑facing assistants that summarise or republish news: rolling, multilingual audits detect regressions that one‑off tests miss.
- Protect publisher signal flow: require that assistants surface canonical article identifiers and respect publisher metadata to prevent misattribution and citation hallucinations.
Notable strengths and the hard limits of current tools
- Strengths: Assistants are fast, accessible and useful for discovery. They democratise entry points to OSINT methods and can reduce labour for routine triage tasks. In newsroom workflows, they can surface leads for human investigators — accelerating geolocation, language translation and pattern discovery.
- Limits and risks: No assistant tested so far provides reliable standalone image provenance checks. Generators and detectors are engaged in an arms race; forensic robustness lags generator realism. Product incentives towards confident answers make assistants prone to assertive misclassification, and platform governance shifts away from professional fact‑checking increase the systemic risk. These are structural problems requiring technical, product and regulatory responses.
Flagging unverifiable or time‑sensitive claims
Some numerical findings vary by study window and methodology; for example, exact percentage failure rates reported for Gemini or other assistants differ slightly across summaries. Those discrepancies usually reflect sample selection, language coverage and timing of the tests. Any specific performance percentage should therefore be treated as time‑bounded: models are updated frequently and metrics can change. When reporting or operationalising risk, teams should rely on rolling audits and reproduce tests against the live models they use. Additionally, attributions about which specific image‑generation tool created a particular viral photo (for example, the use of the Nano Banana frontend or “Gemini” pipeline) are traceable journalistic claims in many cases but are not infallible. Tracing image provenance often involves piecing together metadata, author interviews and pattern matching; where such tracing is impossible to reproduce independently, the safer position is to flag the identification as reported rather than absolute.A practical roadmap for WindowsForum readers and tech teams
- Operationalise the checklist above and embed it in social media policies for official accounts.
- Log every assistant check: keep searchable records of prompts, timestamps and returned claims so that misclassifications can be audited and corrected quickly.
- Train moderation teams in basic OSINT and forensic triage techniques — reverse image search, shadow analysis, and metadata inspection — and couple that training with role‑play exercises that simulate viral misinformation events.
- Evaluate enterprise or private models for sensitive use cases: for regulated or high‑stakes scenarios, prefer solutions offering provenance metadata and audit logs over consumer assistants.
- Demand vendor SLAs that include accuracy and provenance guarantees for news and high‑impact verifications, and insist on external audits.
Conclusion
The inability of AI assistants to reliably detect images their own systems — or similar models — generated should not be dismissed as a quirky bug. It is a structural mismatch between how generative systems are trained and the epistemic demands of verification. The practical consequence is clear: when people move from search engines to conversational assistants as their first line of fact‑checking, they risk accepting polished, authoritative‑sounding answers that lack forensic underpinning. Fixing this will require technical investment (dedicated forensic models and provenance APIs), product shifts (conservative refusal behaviour and transparent citations), and institutional changes (human‑in‑the‑loop verification, independent audits and policy guardrails). Until those changes are widely adopted, the safest posture is a hybrid one: use AI to accelerate discovery, but keep human judgment, source checks and documented verification steps at the centre of any high‑impact decision that depends on an image’s authenticity.Source: Digital Journal AI's blind spot: tools fail to detect their own fakes


