Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, has filed complaints against OpenAI and Meta, alleging copyright infringement. The complaints focus on the use of copyrighted materials in training OpenAI’s large language models, specifically ChatGPT and LLaMA, without obtaining proper consent from the authors.
The lawsuits claim that OpenAI and Meta used datasets that include copyrighted works, including those published by Silverman, Golden, and Kadrey, without permission. While OpenAI’s “Books1” dataset is said to conform to the size of Project Gutenberg, a well-known copyright-free book repository, the plaintiffs’ lawyers argue that the “Books2” dataset is too large to have been derived from legitimate sources and likely includes copyrighted material obtained from shadow libraries such as Library Genesis and Sci-Hub. These shadow libraries offer illegally available copyrighted materials, often distributed in bulk torrent packages.
One exhibit in Silverman’s lawsuit involves an exchange between her lawyers and ChatGPT, where the chatbot was able to summarize her memoir, The Bedwetter, including verbatim passages from the book.
This is not the first time OpenAI has faced legal action over copyright infringement. The company has been served with multiple lawsuits, including a class-action suit alleging violations of federal and state privacy laws related to data scraping for training language models like ChatGPT and DALL-E.
These legal challenges highlight the ongoing debate surrounding the use of copyrighted materials and the responsibilities of organizations like OpenAI when training large language models. The outcome of these cases may have implications for the future use and development of AI technologies that rely on textual data.