UK Copyright Law and Generative AI: Liability, Scraping and Policy

  • Thread Author
The scales balance a CDPA legal book against digital data and cloud tech.
Kevin Sullivan’s briefing from Insider Media sets out the sharp legal fault lines at the intersection of UK copyright law and generative AI, warning that artists, educators and policymakers are locked in a high-stakes debate over how models are trained, who pays for creative inputs, and how to preserve incentive and accountability in an era of “scraping” and large-scale data aggregation.

Background​

Generative AI tools can produce text, code, images, audio and video within seconds by learning statistical patterns from massive datasets. Their rise has been meteoric and pragmatic: businesses and consumers adopt models such as ChatGPT and Copilot to accelerate work, while artists, academics and legislators raise alarm about the unlicensed use of copyrighted material. Sullivan’s overview captures that tension — innovation on one side, creator concern and potential legal exposure on the other.

Why the law matters now​

Copyright remains the central legal framework for protecting the fruits of creative labour in the UK. Under the Copyright, Designs and Patents Act 1988 (CDPA), infringement occurs when a “restricted act” is performed in relation to the whole or a substantial part of a protected work. Importantly, the statutory concept of a “substantial part” is a qualitative test: copying a small but qualitatively important portion can infringe just as surely as wholesale copying. This legal baseline shapes how courts and policymakers evaluate whether AI training and outputs cross the line into infringement.

What Kevin Sullivan laid out​

  • Generative AI “scrapes” internet content en masse to create new outputs that often incorporate or echo protected works.
  • The UK’s current copyright regime restricts copying unless a licence or statutory exception applies.
  • Certain statutory “permitted acts” — for example non-commercial research exceptions and fair dealing — can protect some uses, but they are narrow and fact-dependent.
  • Government policy (labelled “Tech First” in some public commentary) seeks to accelerate AI adoption, including by training pupils in AI skills, while balancing creators’ rights; this has provoked high-profile condemnation from artists such as Sir Elton John.
These are precise, practical observations: the law as written makes unlicensed commercial scraping risky, but the scale and complexity of model training, and the international distribution of infrastructure, blur where liability begins and ends.

Legal contours: what the law actually says​

Copyright, substantial part and qualitative copying​

Under UK law, infringement is triggered if a restricted act is performed with respect to the whole or a substantial part of a work. Courts have repeatedly emphasised that “substantial” is a qualitative concept: even a short excerpt can be "substantial" if it captures the essence or the key expression of the original. This makes copyright analysis of AI outputs highly case-specific: similarity in tone or theme is not enough; courts ask whether a substantial element of the original expression appears in the output.

Text and data mining (TDM) exceptions: narrow and contentious​

The UK already contains a statutory exception for text and data mining (s.29A CDPA), but it is currently limited to non-commercial research. That means a commercial AI developer relying on broad web crawls cannot simply point to the TDM exception as cover. The Government has debated widening the exception to commercial uses in order to spur domestic AI development, but those proposals provoked fierce resistance from creative industries and were withdrawn or delayed for further consultation. The Data (Use and Access) Bill and related consultations have been the focal point of that debate, with musicians and authors demanding transparency, opt-outs and remuneration for reuse.

Who can be liable?​

Under current UK frameworks the possible routes to liability are multiple and fact-sensitive:
  • Direct infringement by an AI developer who copies protected works to create or train a model;
  • Secondary or contributory liability for parties who make infringing outputs available to the public;
  • User liability when a person publishes, distributes or commercially exploits AI-generated material that contains restricted material.
Courts will examine where training occurred, whether the model stores copies, whether outputs reproduce a “substantial part” of a work, and whether statutory exceptions apply. Recent litigation has already begun to probe these issues.

Litigation spotlight: Getty Images v Stability AI and what it means​

The Getty v Stability AI litigation became the most-watched case in Europe over AI and copyright. Getty alleged that Stability trained Stable Diffusion on millions of its copyrighted photos without permission, and argued that model training and outputs amounted to widespread infringement. The litigation produced mixed results and significant legal signals:
  • Getty dropped several copyright claims in the UK action during the course of the trial; the court concluded the model itself did not contain an “infringing copy” in the sense required for secondary infringement under the CDPA. That left important questions unresolved, especially about models trained within the UK or where domestic training data could be proven. The court did find narrower trademark problems where watermarks reappeared in outputs. The judgment is widely read as a partial win for AI developers but not a final legal sea-change.
Why that outcome matters: the decision underscores that UK copyright law — as interpreted by judges to date — does not yet treat statistical model weights as literal “copies” of materials used in training. But the ruling left the broader policy question open: should the law evolve to explicitly treat large-scale scraping as a licence-worthy activity, or should creators be protected via transparency, opt-outs and licensing markets?

Stakeholder perspectives: creativity, education and commerce​

Creators and artists​

Prominent artists have publicly attacked the idea of loosening copyright rules to permit broad AI training without consent. Their points are legitimate and practical:
  • Young and mid-career creators may lack the resources to enforce their rights through litigation or protracted negotiations.
  • If major platforms can mine and monetise creative works without compensation, the incentives that fund creative careers could be undermined.
  • Transparency and attribution are core demands: creators want to know what was used, when, and by whom, and to be able to opt out or negotiate licensing terms. Sir Elton John’s public denunciation of government plans that seem to permit opt-out defaults captured this anxiety and mobilised broader industry opposition.

Educators and employers​

Schools, universities and employers fear both deskilling and misuse. There’s anxiety that students and junior employees may rely on generative tools to produce essays or code without learning fundamentals. At the same time, governments and institutions want to teach AI skills to prepare the workforce for future jobs — a tension that policymakers describe as balancing social fears with economic opportunity. The government’s “Tech First”‑style initiatives aim to fold AI training into education, but they do not remove the copyright or ethics questions that creators raise.

Policymakers and the “Tech First” push​

At London Tech Week the Prime Minister framed a vision in which AI delivers community wealth, jobs and public service improvements — and pledged investments and training programmes to mainstream AI skills in schools. That political momentum pushes for a regulatory environment friendly to AI development, while also promising protections. But this political posture has fuelled a backlash among creators who fear default licences and opt-outs that would, in their view, hand Big Tech a one-way benefit. The government has therefore steered toward further consultation and piecemeal measures rather than instant, sweeping statutory change.

Strengths in the current approach​

  • The UK’s legal framework still recognises creators’ rights and uses well-established doctrines (e.g., the qualitative substantial-part test) to prevent simplistic readings that would automatically sanction wholesale copying by AI developers. This preserves legal principles that protect expression and moral rights.
  • Policymakers are having open consultations, and high-profile litigation (Getty) is illuminating factual and legal gaps; courts and legislatures now have real-world litigation records to inform calibrated reform rather than theoretical guesswork.
  • The focus on transparency, provenance and human oversight in many advisory documents is constructive: requiring model provenance, retention logs and explainability features would make it easier for rights holders and regulators to assess risk and manage opt-outs where appropriate.

Key risks and weaknesses​

  • Legal uncertainty: Courts and statutory rules have not definitively answered whether model training per se is an infringing act when it uses copyrighted material at massive scale, especially when done across borders. Uncertainty raises risk for developers and creators alike. The Getty litigation showed limits to current legal theories and left some pivotal questions unresolved.
  • Power imbalance: Even if an opt-out regime is legislated, smaller creators may be unable to detect or enforce misuse. The reality of resource asymmetry means that theoretical rights can be hollow without enforceable transparency and low-cost remedies. Sir Elton John and others warned that default permissive regimes would disadvantage younger creators.
  • Enforcement friction: Policing large-scale model training — where datasets are international, opaque, and technically complex — will be resource-intensive for regulators and rights holders. Technical forensics that link outputs to training inputs are often difficult to produce.
  • Chilling effect on innovation or creation: Either extreme could occur — overbroad rights enforcement could impede new research and products, whereas under-protection could hollow creative markets. The policy needle is narrow and politically fraught.

Practical guidance for industry and policymakers​

The debate is not only legal; it’s operational. The following pragmatic steps synthesise best practice emerging from courts, firms and regulators:
  • Require provenance and auditable logs: Vendors supplying generative models to enterprises or public bodies should provide machine-readable provenance, dataset inventories and retention records.
  • Adopt human-in-the-loop verification: For any external-facing or legally sensitive product, human review and sign-off remains essential to avoid publishing outputs that replicate protected elements.
  • Negotiate clear contractual safeguards: Commercial contracts must address no‑retrain/no‑use clauses, deletion guarantees, exportable logs and indemnities where appropriate.
  • Support a transparent opt-out registry: If policymakers adopt opt-out systems, a central and searchable registry for creators to assert rights and for models to check compliance is necessary to make any opt‑out meaningful.
  • Fund accessible enforcement and detection tools: Governments should invest in low‑cost forensic tools and dispute-resolution mechanisms that smaller creators can use without prohibitive expense.
  • Educate and certify: Training programmes for schools and workplaces must include ethics, copyright basics and prompt hygiene, so users understand the legal and creative implications of AI outputs.
  • Pilot licensing markets: Public-private experiments that enable creators to licence datasets commercially, possibly through collective management organisations, can create a middle path that compensates creators while enabling innovation.
These steps are operationally feasible and would reduce litigation pressure while laying the groundwork for coherent policy.

Policy options for the UK — a constructive menu​

  1. Strengthen transparency obligations without wholesale opt-outs: Require model builders to disclose what categories of material they collected and provide clear mechanisms for creators to query or opt out, rather than assume permissive rights by default. This improves accountability without immediate market disruption.
  2. Establish a graduated TDM regime: Maintain non-commercial research TDM exceptions, but create a regulated commercial TDM regime that requires either opt-in licensing or an industry-negotiated compensation framework for large-scale commercial uses.
  3. Create sector-specific carve-outs and safeguards: Different creative sectors have different economics; music rights and news publishing may need bespoke remedies (for example, statutory licenses or collective bargaining) that reflect market realities.
  4. Invest in technical provenance and watermarking standards: Government-backed standards for digital provenance, watermarking and forensically reproducible logging will reduce disputes and enable automated compliance checks.
  5. Fund a creator support and enforcement fund: A modest public fund to help individual creators pursue claims or mediation would rebalance enforcement capacity.
These policy options are not mutually exclusive; a hybrid approach will be necessary to balance innovation with protection.

Conclusion​

The UK stands at an inflection point: the ambition to build an AI-ready economy sits uneasily beside legitimate calls to protect artists, journalists and creators. Kevin Sullivan’s briefing captures the present friction succinctly — artists fear a giveaway to Big Tech, policymakers want skills and investment, and courts are being asked to make novel legal findings based on old doctrines.
Recent litigation shows that courts are cautious about stretching traditional copyright doctrinal categories to fit machine learning; decisions to date have been partial and nuanced rather than sweeping. That limited clarity makes the policy task urgent: the UK must design transparency, enforcement and compensation mechanisms that work in practice, or risk creating winners and losers determined by litigation budgets rather than legal principle. If the UK pursues a “Tech First” agenda, it must pair it with binding transparency obligations, practical opt-out/compensation mechanisms and investment in low-cost enforcement. Otherwise, creators will continue to see the future as an either/or choice between technological progress and the economic survival of creative professions. The wiser path — and the one most likely to sustain a healthy creative ecosystem and a dynamic AI sector — is a carefully constructed middle ground: regulated, transparent data use; fair compensation frameworks; and robust provenance and auditing standards that make respect for creators a competitive advantage rather than an afterthought.
Additional reading and legal materials are available for those seeking to dig into case law, statutory wording, and recent policy consultations; practical implementation will require cross-sector negotiation and concrete technical standards that convert the debate into enforceable practice.

Source: Insider Media Ltd Navigating the AI Legal Landscape
 

Back
Top