Nine Newspapers Sue OpenAI and Microsoft Over AI Training Copyright

  • Thread Author
AI copyright headlines glow as a neural lattice rises from a laptop beside scales and a gavel.
Nine newspapers owned or managed by MediaNews Group have filed a sweeping federal copyright suit against OpenAI and Microsoft, alleging the tech giants built and commercialized their generative AI products by copying and training on copyrighted journalism without permission — a complaint that seeks “in excess of $10 billion” in damages and reopens the larger legal fight over whether and how news publishers are entitled to compensation when their reporting is used to train large language models.

Background​

The complaint, filed in the U.S. District Court for the Southern District of New York on November 26, 2025, names the Los Angeles Daily News, The San Diego Union-Tribune, the San Bernardino Sun, the Boston Herald, the Hartford Courant, The Morning Call, the Boulder Daily Camera, the Daily Press and The Virginian‑Pilot as plaintiffs and runs to 119 pages, per reporting based on the court filing. The newspapers say OpenAI and Microsoft “harvested” millions of copyrighted news articles to train the models that power ChatGPT, Copilot and related AI assistants, then used those models to develop commercial products while failing to pay or even seek permission from the publishers. This action is the latest salvo in a multi‑front litigation campaign by publishers and authors against AI platform companies. The same legal terrain includes high‑profile suits filed by The New York Times, the Authors Guild and other publishers and authors, and a prior MediaNews Group case filed in April 2024 that covered a different set of newspapers; that earlier case is continuing along a separate track. The line of litigation over whether training on copyrighted material without permission constitutes infringement is now well established as a leading test case for copyright law in the age of generative AI.

What the complaint alleges​

Core claims and requested relief​

According to published excerpts and reporting about the complaint, the newspapers allege that OpenAI and Microsoft used their copyrighted reporting as raw training material for large language models (LLMs) and related systems and then monetized the outputs in subscription and enterprise products. The plaintiffs seek more than $10 billion in damages, asserting that the defendants’ actions have not only appropriated content without payment but are also siphoning readers and undermining publishers’ business models. Their counsel argues the companies “pay for chips, computers and programmers — but steal the raw material for GAI products — valuable well‑written content — from hard‑working journalists.” The complaint is likely to assert multiple causes of action typical in these suits: direct and contributory copyright infringement, violations of state‑law rights (depending on the pleadings), and requests for injunctive relief to restrain further use of the plaintiffs’ works in training and product outputs. The exact legal theories and statutory invocations will be tested in pretrial motions and discovery.

Who’s suing and why it matters​

The newspapers involved are a mix of local and regional titles that rely heavily on subscription revenue, local advertising and referral traffic. Their argument rests on two interconnected harms: (1) the alleged unauthorized copying and use of copyrighted works to build commercial AI models; and (2) the downstream commercial effect — that AI outputs or agent responses can substitute for a reader’s visit to the publisher’s site, thereby reducing subscription conversions and ad impressions. That double harm — copying plus displacement — is now central to publishers’ legal strategy and policy demands.

Legal context and precedent​

The broader litigation landscape​

Publishers and authors have brought multiple suits against AI companies in recent years. Courts have already begun parsing whether model training and downstream outputs are protected by fair use or amount to infringement.
A landmark decision earlier in 2025 permitted The New York Times and co‑plaintiffs to proceed with copyright claims against OpenAI and Microsoft, with parts of the cases surviving motions to dismiss. That ruling signaled judicial willingness to allow discovery and fact development on how models were constructed and used. Separately, the MediaNews Group and Tribune‑affiliated publishers filed a related complaint in April 2024 covering other newspapers — a suit that is moving on a different procedural schedule and is already subject to contested discovery battles. That earlier case underscored two realities: (1) publishers are coordinating litigation strategies; and (2) the resulting discovery fights (seeking training records and logging/output information from AI vendors) are becoming central to determining the extent of alleged copying and the economic effects of the outputs.

Recent discovery rulings that matter​

Discovery has been a battlefield. In a parallel case brought by the Authors Guild and a roster of bestselling writers, a Manhattan magistrate judge recently ordered OpenAI to produce internal communications about why it deleted certain large book datasets — a ruling that weakens a blanket claim of privilege where a company’s state of mind is central to the case. That decision could have ripple effects: courts are scrutinizing not just whether materials were used, but how and why companies made decisions about dataset retention and deletion — issues that go to willfulness and good‑faith defenses.

The technology: what plaintiffs say and what experts will test​

How model training typically works — and why publishers object​

Large language models are trained on extremely large corpora assembled from web data, licensed datasets and other sources. Plaintiffs allege that this web‑scale ingestion included copyrighted news content, sometimes taken in bulk and without downstream licensing or attribution. Publishers claim models can reproduce or closely paraphrase article text and that the models’ capacity to answer factual questions or supply summaries provides a commercial substitute for the original reporting.
Defendants typically respond that models are trained on publicly accessible data and that training involves statistical learning rather than verbatim copying — arguments that often lead to complex factual discovery about the datasets, ingestion mechanisms, and whether downstream model outputs can be traced to identifiable copyrighted material. Courts will likely examine whether publishers’ works were directly included in training corpora, whether outputs reproduce protected expression, and whether the companies’ conduct fits within fair‑use factors (purpose, nature, amount, and market effect).

Technical fights to expect in discovery​

Expect aggressive discovery requests for:
  • Training corpora inventories, sampling methods, and retention logs.
  • Model output logs and usage telemetry to show whether and how often publisher material appears in assistant outputs.
  • Internal communications explaining dataset choices, deletions and legal counsel’s role — issues already flagged by judicial orders in other cases.
Those technical records are the crux: publishers want to show a direct connection between their journalism and the defendant systems; defendants want to keep training pipelines and provenance opaque or show that use is lawful.

Industry context and the economic argument​

Why local newspapers are staking so much on this​

Newspapers operate on thin margins. Over the last two decades, advertising and subscription models have been reshaped by digital platforms; publishers argue that AI assistants that provide ready answers and summaries can disintermediate their audiences. That alleged displacement of readership — and the subsequent loss of subscription revenue and ad impressions — is the economic backbone of the damages theories in recent suits. Policymakers and publishers have repeatedly pointed to the structural decline in industry revenue to justify both legal and legislative remedies.

Technical and commercial counterarguments from AI vendors​

OpenAI and Microsoft historically have argued that training on broadly available web content is lawful, that tools like ChatGPT can drive traffic to original sources (when citation features are used), and that there are practical obstacles to licensing every web publisher. Microsoft, via its investments in content partnerships and product features (e.g., Copilot integrations and attribution mechanisms), has tried to position itself as a partner to publishers while resisting wholesale licensing obligations. Those public positions are likely to be refined as the litigation develops. At press time, the newspapers reported that spokespeople for OpenAI and Microsoft did not immediately respond to requests for comment.

Strengths and weaknesses of the publishers’ case​

Strengths​

  • Concrete claim and damages framing: The complaint quantifies harm (seeking in excess of $10 billion) and ties alleged copying to commercial products that generate subscription or enterprise revenue for the defendants, strengthening the economic‑harm prong.
  • Parallel rulings favoring discovery: Recent orders allowing discovery in related cases — including orders compelling internal communications — make it more likely the publishers can probe training datasets and counsel advice, which are pivotal factual elements.
  • Solid public sympathy and policy momentum: Lawmakers and the public are increasingly attentive to the economics of news production, which could influence the long arc of settlements, licensing markets or regulation.

Weaknesses and legal hurdles​

  • Fair use defense complexity: Courts will analyze fair use by weighing purpose, amount and market effect; proving that training and downstream outputs are not transformative enough may be difficult and fact‑specific.
  • Tracing problem: Demonstrating that models were trained with specific copyrighted articles — and that the defendants’ outputs actually reproduce those articles — is a complex technical challenge requiring deep discovery and careful expert testimony.
  • Scale and precedent variability: Appellate outcomes in this area remain unsettled; different courts have reached divergent conclusions on related issues, and the law is still developing.

Risks and broader consequences​

For publishers​

  • Short term: Litigation is expensive and slow. Even a strong legal win may take years to monetize through damages and injunctive relief.
  • Long term: A favorable ruling could force AI vendors to negotiate licensing deals or change ingestion practices, creating new monetization opportunities for publishers; conversely, an adverse ruling could cement broad reuse without payment.

For AI companies and the public​

  • Legal liability and compliance costs could rise sharply if courts adopt a restrictive view of training on copyrighted works or require more granular provenance and opt‑in/opt‑out mechanisms for publishers.
  • Overbroad remedies could chill innovation or impose high transaction costs on model development, particularly for smaller teams and startups.

What to watch next — procedural milestones and tactical playbooks​

  1. Service and responses: Expect OpenAI and Microsoft to file motions to dismiss, and to press procedural defenses (e.g., threshold standing or statutory immunity arguments). The defendants may also ask for consolidation with related cases or for transfer to a different venue.
  2. Early discovery battles: Magistrate judges will decide whether AI vendors must produce training inventories, model logs and communications about dataset deletions or usage — those rulings will shape case trajectories. Recent magistrate rulings ordering disclosure of internal communications strengthen the publishers’ discovery prospects.
  3. Expert evidence phase: Both sides will rely heavily on technical experts to explain model training, data provenance and the frequency/extent to which a model’s output mirrors copyrighted text.
  4. Settlement incentives: Given the high cost and risk of trial, expect settlement negotiations or licensing agreements to emerge once the contours of discovery are clearer.

Practical takeaways for publishers, platform engineers and policymakers​

  • For publishers: Accelerate efforts to record and document harms (analytics showing traffic loss, instance logs where AI outputs mirrored full reporting) and to pursue technical protections (server‑side gating, machine‑readable licensing APIs). Legislative and commercial strategies — such as creating licensing consortia — are credible complements to litigation.
  • For AI vendors and engineers: Invest in stronger dataset provenance, implement opt‑out respect mechanisms for paywalled content, and consider negotiated licensing frameworks where large‑scale reuse of journalism occurs. Tightening controls on data ingestion and offering transparent audit trails will reduce legal risk and help restore trust.
  • For policymakers: Consider balanced interventions that protect creative labor while preserving innovation — examples include standardized licensing frameworks, incentives for machine‑readable content access agreements, and clearer rules on how automated agents should interact with paid content.

Why this matters beyond the courtroom​

This series of lawsuits is not an abstract copyright fight: it is a test of how 21st‑century content economies will work. Local journalism is a public good that underpins civic accountability; if AI systems can cheaply reproduce reporting and reduce referral traffic, many publishers fear a further erosion of journalistic capacity. Conversely, AI vendors argue that building responsible, authoritative assistants requires broad access to information — including news — and that innovation should not be unduly constrained.
The forthcoming months of discovery and motions will wrestle with technical specifics (what data was used, how it was used, and whether outputs reproduce protected expression) and with deep policy questions about how creative labor is valued and compensated in an automated, data‑hungry economy. The outcome will shape product design, licensing markets and possibly regulation for years to come.

Conclusion​

The MediaNews Group lawsuit filed on November 26, 2025 marks the latest escalation in a multi‑year legal confrontation over AI training practices and publishers’ rights. The complaint’s scope — nine newspapers, a 119‑page filing and claims exceeding $10 billion — reflects the economic stakes for local news organizations and the pace at which copyright law is being adapted to new technologies. What happens in discovery, and in parallel rulings across related cases (including orders compelling production of internal communications), will determine whether litigation forces a new commercial relationship between content creators and AI platforms — or whether courts carve out a broad exception that allows large‑scale model training with minimal direct payments to publishers. Either way, the case will be a bellwether for how the law balances creative labor, consumer access, and technological innovation in the decade ahead.
Source: Los Angeles Daily News 9 more newspapers sue OpenAI, Microsoft, alleging stolen content used in AI apps
 

Back
Top