Religious authors allege tech companies used their books to train AI without permission

In this Jan. 31, 2016, photo, former Republican presidential candidate and former Arkansas Gov. Mike Huckabee speaks at Inspired Grounds Cafe in West Des Moines, Iowa. (AP Photo/Kiichiro Sato)

Former Arkansas Gov. Mike Huckabee (R) and four other religious authors are suing several tech companies for allegedly using their books to train artificial intelligence (AI) models without their permission.

The lawsuit accuses Meta, Microsoft, Bloomberg and the EleutherAI Institute of using a particular dataset known as Books3, which scraped information from a massive collection of nearly 200,000 pirated books.

“While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work,” argued the lawsuit, filed in New York federal court on Tuesday.

EleutherAI, a nonprofit AI research group, included Books3 in a larger open-source dataset it created for training AI systems, known as the Pile, according to the lawsuit.

In particular, the Pile was created to train large language models (LLMs), which are designed to understand and generate human language and require vast amounts of data to train.

Meta used the Pile and Books3 to train its LLM, called LLaMa, and later partnered with Microsoft to develop an updated version of the model, known as LLaMa2. Bloomberg also used Books3 to train its finance-oriented LLM BloombergGPT.

The lawsuit alleged the companies used the Pile or Books3 “with the full knowledge and understanding that the datasets they were using to train their LLMs were assembled from copyrighted works.”

The legal action is the latest in a series of efforts by authors, actors and artists to protect or receive compensation for their work in the face of the rapidly evolving technology. 

Another group of writers — including “My Sister’s Keeper” author Jodi Picoult and “A Game of Thrones” author George R.R. Martin — similarly filed suit against OpenAI last month, accusing the company of using their copyrighted works to train its AI-powered chatbot technology ChatGPT.

Tags Artificial Intelligence Bloomberg COPYRIGHT large language models Meta Microsoft Mike Huckabee

Copyright 2023 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.