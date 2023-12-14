A group of content creators, including comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, has filed a lawsuit against Meta, the parent company of Facebook, alleging that Meta trained its artificial intelligence (AI) models on copyrighted materials.

The lawsuit, which was revised after a California judge dismissed part of it, includes chat logs of a Meta-affiliated researcher discussing the use of a dataset called “The Pile.” This dataset, compiled EleutherAI, contained a section called Books3, which included 196,640 books.

According to the complaint, the researchers acknowledged that including copyrighted content in the dataset raised legal concerns. One researcher mentioned that Meta’s lawyers had raised concerns and that they would need legal approval before publishing any information related to the dataset.

The lawsuit also highlights the legal gray area surrounding copyrighted books in the U.S., as it hasn’t been litigated extensively, leaving many questions unresolved. This further complicates the issue of using copyrighted content to train AI models.

Although Meta has not disclosed the training sources for its Llama 2 model, the plaintiffs believe that it was also trained on the Books3 dataset. The conversation about the dataset was later removed from public view, and the dataset itself was taken down from websites due to copyright infringement claims.

Meta has not yet responded to the lawsuit or provided any comment on the matter.

This lawsuit raises important questions about the use of copyrighted content in training AI models. As AI technology continues to advance, it is crucial to address these legal and ethical considerations to ensure that intellectual property rights are protected.