Reading List: Data Governance in the Age of Generative AI
The Sources of LLM Data
In the Wake of Generative AI, Industry-Led Standards for Data Scraping Are a Must
Center for Data Innovation
Generative AI Is Scraping Your Data. So, Now What?
Dark Reading
Revealed: The Authors Whose Pirated Books Are Powering Generative AI
The Atlantic
AI Researchers Uncover Ethical, Legal Risks to Using Popular Data Sets
The Washington Post
AI2 Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining
Medium
The Continuum of Closed and Open LLMs and their Implications for Data Governance
Open-Sourcing Highly Capable Foundation Models
Governance Ai
Will Open Source AI Shift Power from ‘Big Tech’? It Depends.
Tech Policy Press
GitHub And Others Call For More Open-Source Support in EU AI Law
The Verge
Opening up ChatGPT: Tracking Openness, Transparency, And Accountability in Instruction-Tuned Text Generators
Association For Computing Machinery
The Open-Source AI Boom is Built on Big Tech’s Handouts. How Long Will it Last? ”
Technology Review
How to Promote Responsible Open Foundation Models
Stanford University
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Cornell University
The Foundation Model Transparency Index
Center for Research on Foundation Models
Evaluating LLMs is a minefield”
Princeton University
What the executive order means for openness in AI
AI Snake Oil
Pen Sourcing The AI Revolution
Demos UK
Data Openness and Society
The Paradox of Open
Open Future
AI is Tearing Wikipedia Apart
Vice
ChatGPT Stole Your Work. So What Are You Going to Do?
WIRED
My Books Were Used to Train Meta’s Generative AI. Good.
The Atlantic
Right to be Forgotten in The Era of Large Language Models: Implications, Challenges, And Solutions
Cornell University
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Cornell University
The Privacy Bias Trade-Off
Stanford University
Workers Could be the Ones to Regulate AI
Financial Times
What are Governments Doing to Close the Data Governance Gaps
A law for foundation models: the EU AI Act can improve regulation for fairer competition
OEDC.AI
France bets big on open-source AI
Politico
AI Safety Summit: Introduction (HTML)
GOV.UK
China Bets on Open-Source Technologies to Boost Domestic Innovation
Merics
New Ideas for Shared Data Governance
Data Dysphoria: The Governance Challenge Posed by Large Learning Models
Social Science Research Network
AI_Commons
Open Future
Stewarding The Sum of All Knowledge in The Age of AI
Open Future
Core Considerations for Exploring AI Systems as Digital Public Goods
Digital Public Goods
Generative AI And The Digital Commons
Cornell University
Datasheets for Datasets
Cornell University
Anthropic Thinks ‘Constitutional AI’ is The Best Way to Train Models
TechCrunch
Regulating ChatGPT And Other Large Generative AI Models
Cornell University
Did ChatGPT Really Say That?”: Provenance in The Age of Generative AI.
Harvard University Library Innovation Lab
New Synthetic Data Techniques Could Change The Way AI Models are Trained
Semafor
Speaking in Tongues: Teaching Local Languages to Machines
Digital Impact Alliance
Data Governance in the Age of Large-Scale Data-Driven Language Technology
Cornell University
Data Governance in the Age of Generative AI
Adam Zable
Director of Emerging Technologies – Digital Trade & Data Governance Hub