Suspicions were raised that Meta CEO Mark Zuckerberg approved the dataset for use for training artificial intelligence (AI) models despite knowing that there was a copyright controversy.
Reuters cited a lawsuit document submitted to the U.S. District Court for the Northern District of California on the ninth (local time), stating that CEO Zuckerberg used the AI learning dataset ‘LibGen’, which appears to contain a lot of pirated copies, to coach an AI model. It was reported that it was approved to accomplish that.
A gaggle of writers, including comedians Sarah Silverman and Ta-Nehisi Coates, sued Mehta in 2023, alleging that he had used their books without permission within the means of training the Large Language Model (LLM) ‘Rama’. Meta refuted this by citing the fair use principle.
Last yr, Mehta won the case. California District Judge Vince Chhabria rejected the plaintiffs’ claims that Meta’s AI-generated text infringed their copyrights and that Meta removed copyright management information (CMI). Accordingly, the authors requested the court to submit a revised grievance.
Accordingly, in keeping with the newly submitted lawsuit documents, Meta’s internal employees were aware that LipGen was a pirated dataset and were concerned that using this dataset could cause problems.
The authors also claimed that CEO Zuckerberg approved using LipGen to coach AI models and that Meta wrote a script to remove copyright information from LipGen data. This may be interpreted as an try to hide the very fact of copyright infringement.
The plaintiffs emphasized that the brand new evidence establishes copyright infringement and provides justification for reviving the CMI lawsuit and adding a pc fraud claim.
Judge Chhabria allowed the authors to file an amended grievance, but reportedly remained skeptical concerning the fraud and validity of the CMI claims.
Reporter Park Chan cpark@aitimes.com