That is where the info to construct AI comes from

-

Their findings, shared exclusively with , show a worrying trend: AI’s data practices risk concentrating power overwhelmingly within the hands of just a few dominant technology firms. 

Within the early 2010s, data sets got here from a wide range of sources, says Shayne Longpre, a researcher at MIT who is a component of the project. 

It got here not only from encyclopedias and the online, but additionally from sources resembling parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and picked up from different sources to suit individual tasks, Longpre says.

Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector began seeing performance get well the larger the models and data sets were. Today, most AI data sets are built by indiscriminately hoovering material from the web. Since 2018, the online has been the dominant source for data sets utilized in all media, resembling audio, images, and video, and a spot between scraped data and more curated data sets has emerged and widened.

“In foundation model development, nothing seems to matter more for the capabilities than the size and heterogeneity of the info and the online,” says Longpre. The necessity for scale has also boosted using synthetic data massively.

The past few years have also seen the rise of multimodal generative AI models, which might generate videos and pictures. Like large language models, they need as much data as possible, and the most effective source for that has grow to be YouTube. 

For video models, as you possibly can see on this chart, over 70% of information for each speech and image data sets comes from one source.

This may very well be a boon for Alphabet, Google’s parent company, which owns YouTube. Whereas text is distributed across the online and controlled by many alternative web sites and platforms, video data is incredibly concentrated in a single platform.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x