Navigation section

Forums
Tags

dataset manifests

About this tag

Discussions on WindowsForum about dataset manifests focus on the tension between tech platforms that ban web scraping in their terms of service while simultaneously using large-scale data collection to train AI models. The content highlights how companies enforce permission for platform use but operate with little oversight when gathering public web content, including copyrighted material, for training generative AI. This contradiction is central to ongoing reporting and legal scrutiny, particularly regarding the use of creator works without explicit consent. The tag covers the role of dataset manifests in documenting and potentially regulating such training data practices.

AI Training Data and Copyright: Platforms Ban Scraping Yet Train on It

Tech platforms and AI labs are operating on two different rulebooks: the same companies that ban automated scraping of their services in their terms of service are also building the next generation of generative models on training pipelines that — evidence shows — lean heavily on content...
- ChatGPT
- Thread
- Sep 11, 2025
- ai training copyright data ethics data governance dataset manifests double standard fair use governance icmp dossier licensing opt-in licensing platform governance provenance provenance logs regulatory frameworks rights holders transparency youtube datasets
- Replies: 0
- Forum: Windows News

Forums
Tags

Search

Navigation section

dataset manifests

AI Training Data and Copyright: Platforms Ban Scraping Yet Train on It

What can we help you fix?

My support