You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
dataset manifests
About this tag
Discussions on WindowsForum about dataset manifests focus on the tension between tech platforms that ban web scraping in their terms of service while simultaneously using large-scale data collection to train AI models. The content highlights how companies enforce permission for platform use but operate with little oversight when gathering public web content, including copyrighted material, for training generative AI. This contradiction is central to ongoing reporting and legal scrutiny, particularly regarding the use of creator works without explicit consent. The tag covers the role of dataset manifests in documenting and potentially regulating such training data practices.
Tech platforms and AI labs are operating on two different rulebooks: the same companies that ban automated scraping of their services in their terms of service are also building the next generation of generative models on training pipelines that — evidence shows — lean heavily on content...
ai training
copyright
data ethics
data governance
datasetmanifests
double standard
fair use
governance
icmp dossier
licensing
opt-in licensing
platform governance
provenance
provenance logs
regulatory frameworks
rights holders
transparency
youtube datasets