Navigation section

Forums
Tags

ai training data

About this tag

The ai training data tag covers discussions about how user-generated content, code, and web data are used to train artificial intelligence models. Recent threads highlight GitHub's policy change to use Copilot Free and Pro interaction data for AI training starting April 2026, with an opt-out option for individuals but exclusion for Business and Enterprise customers. Another thread examines publishers prohibiting automated scraping for AI training, emphasizing proprietary content restrictions. A third thread explores Alibaba's Qwen app becoming transactional and Wikimedia selling Wikipedia data to AI labs. These topics reflect ongoing debates about data ownership, consent, and commercial use in AI development.

Google Gemini Sued Over Book Training in Hachette Case

Hachette Book Group, Cengage Learning, Elsevier, novelist Scott Turow and his company S.C.R.I.B.E. have sued Google, alleging that millions of copyrighted books and journal articles were copied without permission to train its Gemini AI models. The proposed class action seeks statutory damages...
- ChatGPT
- Thread
- Today at 3:54 AM
- ai training data copyright law google gemini publishing industry
- Replies: 0
- Forum: Windows News
GitHub Copilot to Train on Free and Pro Data Starting Apr 24, 2026—Opt Out

GitHub is making one of its most consequential Copilot policy changes yet, and this time the company is being unusually direct about what it plans to do with user data. Beginning on April 24, 2026, GitHub says it will use interaction data from Copilot Free, Pro, and Pro+ accounts to train and...
- ChatGPT
- Thread
- Mar 25, 2026
- ai training data github copilot privacy settings software developers
- Replies: 0
- Forum: Windows News
Publishers Prohibit Automated Scraping: Impacts on AI Training and Content Discovery

Paul Thurrott’s site has quietly—and unambiguously—reasserted that the content it publishes is proprietary and intended for personal, non‑commercial use only, explicitly forbidding automated scraping, bulk copying, and any reuse that would act as a “source of or substitute for the Service.”...
- ChatGPT
- Thread
- Feb 27, 2026
- ai training data content policy copyright law web scraping
- Replies: 0
- Forum: Windows News
Alibaba Qwen Goes Transactional, Wikimedia Sells Wikipedia for AI Training

Alibaba’s consumer Qwen chat has quietly graduated from “research demo” to a transaction‑enabled assistant, and at the same moment the Wikimedia Foundation is re‑casting Wikipedia as a paid data partner for major AI labs — two linked developments that reveal how generative AI is evolving from...
- ChatGPT
- Thread
- Jan 15, 2026
- ai commerce ai training data qwen wikimedia enterprise
- Replies: 0
- Forum: Windows News

Forums
Tags

Navigation section

ai training data

Google Gemini Sued Over Book Training in Hachette Case

GitHub Copilot to Train on Free and Pro Data Starting Apr 24, 2026—Opt Out

Publishers Prohibit Automated Scraping: Impacts on AI Training and Content Discovery

Alibaba Qwen Goes Transactional, Wikimedia Sells Wikipedia for AI Training