Reading List: Data Governance in the Age of Generative AI

The Sources of LLM Data

In the Wake of Generative AI, Industry-Led Standards for Data Scraping Are a Must
Center for Data Innovation

Generative AI Is Scraping Your Data. So, Now What?
Dark Reading

Revealed: The Authors Whose Pirated Books Are Powering Generative AI
The Atlantic

AI Researchers Uncover Ethical, Legal Risks to Using Popular Data Sets
The Washington Post

AI2 Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining
Medium
The Continuum of Closed and Open LLMs and their Implications for Data Governance

Open-Sourcing Highly Capable Foundation Models
Governance Ai

Will Open Source AI Shift Power from ‘Big Tech’? It Depends.
Tech Policy Press

GitHub And Others Call For More Open-Source Support in EU AI Law
The Verge

Opening up ChatGPT: Tracking Openness, Transparency, And Accountability in Instruction-Tuned Text Generators
Association For Computing Machinery

The Open-Source AI Boom is Built on Big Tech’s Handouts. How Long Will it Last? ”
Technology Review

How to Promote Responsible Open Foundation Models
Stanford University

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Cornell University

The Foundation Model Transparency Index
Center for Research on Foundation Models

Evaluating LLMs is a minefield”
Princeton University

What the executive order means for openness in AI
AI Snake Oil

Pen Sourcing The AI Revolution
Demos UK
Data Openness and Society

The Paradox of Open
Open Future

AI is Tearing Wikipedia Apart
Vice

ChatGPT Stole Your Work. So What Are You Going to Do?
WIRED

My Books Were Used to Train Meta’s Generative AI. Good.
The Atlantic

Right to be Forgotten in The Era of Large Language Models: Implications, Challenges, And Solutions
Cornell University

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Cornell University

The Privacy Bias Trade-Off
Stanford University

Workers Could be the Ones to Regulate AI
Financial Times
What are Governments Doing to Close the Data Governance Gaps

A law for foundation models: the EU AI Act can improve regulation for fairer competition
OEDC.AI

France bets big on open-source AI
Politico

AI Safety Summit: Introduction (HTML)
GOV.UK

China Bets on Open-Source Technologies to Boost Domestic Innovation
Merics
New Ideas for Shared Data Governance

Data Dysphoria: The Governance Challenge Posed by Large Learning Models
Social Science Research Network

AI_Commons
Open Future

Stewarding The Sum of All Knowledge in The Age of AI
Open Future

Core Considerations for Exploring AI Systems as Digital Public Goods
Digital Public Goods

Generative AI And The Digital Commons
Cornell University

Datasheets for Datasets
Cornell University

Anthropic Thinks ‘Constitutional AI’ is The Best Way to Train Models
TechCrunch

Regulating ChatGPT And Other Large Generative AI Models
Cornell University

Did ChatGPT Really Say That?”: Provenance in The Age of Generative AI.
Harvard University Library Innovation Lab

New Synthetic Data Techniques Could Change The Way AI Models are Trained
Semafor

Speaking in Tongues: Teaching Local Languages to Machines
Digital Impact Alliance

Data Governance in the Age of Large-Scale Data-Driven Language Technology
Cornell University
Data Governance in the Age of Generative AI


Adam Zable
Director of Emerging Technologies – Digital Trade & Data Governance Hub