Copyright watchdog halts distribution of AI training datasets

-

(Photo = Shutterstock)

A Dutch copyright watchdog has said it has stopped the distribution of a dataset used to coach artificial intelligence (AI). The group, which has been cracking down on piracy for greater than twenty years, has now expanded its scope to incorporate AI training data.

Reuters reported on the thirteenth (local time) that BREIN, headquartered within the Netherlands, has stopped distributing datasets for AI model training.

“The dataset comprises tens of 1000’s of books and news articles, in addition to Dutch subtitles for various movies and TV series,” Brain said in a press release.

The dataset creator confirmed that it was illegal and agreed to stop distributing it, and removed it from the web site where it was available for download. Brain didn’t disclose the identity of the distributor as a result of Dutch privacy regulations.

“It’s unclear what number of AI firms have used this dataset for training,” said Bastian van Ramshorst, Brain’s director.

He also said, “It is vitally obscure the contents of the dataset, but we are attempting to take motion to avoid possible future litigation.”

(Photo = Brain)
(Photo = Brain)

This isn’t the primary time copyright watchdogs have accessed AI datasets. Last 12 months, a copyright protection group called the Danish Copyright Coalition for Denmark forcibly removed a big dataset called Book3.

Book3 is a file known to contain roughly 1.2 million books, and it is understood that it was also utilized by NVIDIA to coach the massive language model (LLM) ‘NeMo’. For this reason, NVIDIA was sued by three authors.

Brain also became famous in 1999 when it began cracking down on illegal video copying and in 2010 when it won a lawsuit related to illegal torrent sharing. Now, it has expanded its scope to AI learning data.

Brain said, “The dataset creator signed a press release promising never to infringe again and provided information in regards to the firms that downloaded the dataset.”

Brain will investigate AI models using this dataset and phone relevant parties.

Reporter Im Dae-jun ydj@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x