In the internet age, copyright infringement lawsuits concentrate public attention on the speed at which technology races ahead of longstanding legal precedent.
The web crossed the artificial intelligence threshold in November 2022, when OpenAI launched ChatGPT, a free-to-the-public “generative AI” tool that answers questions in human language. The latest AI-related copyright infringement lawsuits allege that chatbots, robots, and other learning machines are getting their education from commercially produced works, which are used without permission or compensation.
Large Language Models like ChatGPT “learn” to create images and texts after “training” the systems on enormous amounts of existing works converted to data — the famous “ones and zeroes” of computer code. BBC has reported that training sources for ChatGPT amounted to 570 GB of data, or approximately 300 billion words.
Beyond commercially published books, journals, and newspapers, AI databases derive from a vast online trove of publicly available social media and Wikipedia entries, as well as digitized library and museum collections, court proceedings, and government legislation and regulation.
Consumption of public and private individual data on the “open” web marks an important shift in digital evolution. No one is left out. Consequently, we have all become stakeholders.
AI is now forcing us to consider viewing copyright as a public good.
In 2005, at the crest of Web 1.0, the Association of American Publishers and the U.S.-based Authors Guild sued search giant Google for scanning entire library collections of books and other publications without permission. The litigants eventually lost the so-called “Google Books” case, with a federal judge ruling in 2013 that the scanning was a “transformative use” and not infringement.
When ChatGPT launched 18 months ago, OpenAI CEO Sam Altman safely predicted on Twitter that “talking to the computer is going to be a big deal,” Also a safe prediction was that corporate copyright-holders would charge OpenAI with copyright infringement.
Best known among the current cases is the New York Times lawsuit brought against OpenAI last December.
The web in 2024 is nothing like it was in 2005. Compensation for use of works to train AI solutions should not go only to those incentivized entirely by profit.
Media actors today include literally everyone, including the creators of so-called “user-generated content.” Long-lived media conglomerates with roots deep in analog soil are a notoriously dwindling number.
Copyright is automatically ascribed to every “fixed” work, whether a photograph taken on a smartphone, or a dance posted on TikTok. Billions of such works are created every day. This “intellectual property” may be owned by an organization and by any individual. You hold the copyright to thousands of these creations.
Like historic monuments, TikTok dances and Instagram posts constitute the cultural capital that a community or nation in 2024 draws on for inspiration. Like natural resources, social media output derives value in the aggregate.
In New Zealand, government, museums, and creative industry representatives have acknowledged a responsibility to respect Māori taonga (treasures) when used in any AI system, and to consider any effects of such use on Māori culture.
In Italy, the Cultural Heritage Code requires authorization and payment for digital uses of art by Leonardo DaVinci and other national treasures.
In the new, global media ecosystem, AI and user-generated content will interact in a symbiotic cycle of information and transformation. The U.S. should therefore move with urgency to assert a national interest in our communal creative expression. The Constitution gives Congress the necessary authority to update copyright law.
Statutory licensing schemes for copyright-protected works are already applied to cable television systems and music recordings with great success. Fees collected for AI rights-licensing of publicly available works need not be burdensome. The funds can help to underwrite essential public education in digital literacy and civil discourse online.
OpenAI, along with Meta, Apple, Google, Amazon, and others who stand to benefit, must recognize the debt owed to the American people for the data that fuels their AI solutions.
Christopher Kenneally is a Boston-based freelance journalist and podcaster who has covered the intersection of technology and intellectual property law for two decades.