Vana is letting users own a chunk of the AI models trained on their data

-

In February 2024, Reddit struck a $60 million take care of Google to let the search giant use data on the platform to coach its artificial intelligence models. Notably absent from the discussions were Reddit users, whose data were being sold.

The deal reflected the fact of the trendy web: Big tech corporations own virtually all our online data and get to choose what to do with that data. Unsurprisingly, many platforms monetize their data, and the fastest-growing method to accomplish that today is to sell it to AI corporations, who’re themselves massive tech corporations using the information to coach ever more powerful models.

The decentralized platform Vana, which began as a category project at MIT, is on a mission to offer power back to the users. The corporate has created a completely user-owned network that enables individuals to upload their data and govern how they’re used. AI developers can pitch users on ideas for brand new models, and if the users comply with contribute their data for training, they get proportional ownership within the models.

The thought is to offer everyone a stake within the AI systems that may increasingly shape our society while also unlocking latest pools of information to advance the technology.

“This data is required to create higher AI systems,” says Vana co-founder Anna Kazlauskas ’19. “We’ve created a decentralized system to improve data — which sits inside big tech corporations today — while still letting users retain ultimate ownership.”

From economics to the blockchain

A whole lot of highschool students have pictures of pop stars or athletes on their bedroom partitions. Kazlauskas had an image of former U.S. Treasury Secretary Janet Yellen.

Kazlauskas got here to MIT sure she’d grow to be an economist, but she ended up being considered one of five students to hitch the MIT Bitcoin club in 2015, and that have led her into the world of blockchains and cryptocurrency.

From her dorm room in MacGregor House, she began mining the cryptocurrency Ethereum. She even occasionally scoured campus dumpsters in the hunt for discarded computer chips.

“It got me desirous about every thing around computer science and networking,” Kazlauskas says. “That involved, from a blockchain perspective, distributed systems and the way they will shift economic power to individuals, in addition to artificial intelligence and econometrics.”

Kazlauskas met Art Abal, who was then attending Harvard University, in the previous Media Lab class Emergent Ventures, and the pair decided to work on latest ways to acquire data to coach AI systems.

“Our query was: How could you’ve a lot of people contributing to those AI systems using more of a distributed network?” Kazlauskas recalls.

Kazlauskas and Abal were trying to deal with the establishment, where most models are trained by scraping public data on the web. Big tech corporations often also buy large datasets from other corporations.

The founders’ approach evolved through the years and was informed by Kazlauskas’ experience working on the financial blockchain company Celo after graduation. But Kazlauskas credits her time at MIT with helping her take into consideration these problems, and the trainer for Emergent Ventures, Ramesh Raskar, still helps Vana take into consideration AI research questions today.

“It was great to have an open-ended opportunity to only construct, hack, and explore,” Kazlauskas says. “I believe that ethos at MIT is absolutely vital. It’s nearly constructing things, seeing what works, and continuing to iterate.”

Today Vana takes advantage of a little-known law that enables users of most big tech platforms to export their data directly. Users can upload that information into encrypted digital wallets in Vana and disburse it to coach models as they see fit.

AI engineers can suggest ideas for brand new open-source models, and other people can pool their data to assist train the model. Within the blockchain world, the information pools are called data DAOs, which stands for decentralized autonomous organization. Data may also be used to create personalized AI models and agents.

In Vana, data are utilized in a way that preserves user privacy since the system doesn’t expose identifiable information. Once the model is created, users maintain ownership in order that each time it’s used, they’re rewarded proportionally based on how much their data helped trained it.

“From a developer’s perspective, now you’ll be able to construct these hyper-personalized health applications that have in mind exactly what you ate, the way you slept, the way you exercise,” Kazlauskas says. “Those applications aren’t possible today due to those walled gardens of the large tech corporations.”

Crowdsourced, user-owned AI

Last 12 months, a machine-learning engineer proposed using Vana user data to coach an AI model that would generate Reddit posts. Greater than 140,000 Vana users contributed their Reddit data, which contained posts, comments, messages, and more. Users selected the terms wherein the model might be used, and so they maintained ownership of the model after it was created.

Vana has enabled similar initiatives with user-contributed data from the social media platform X; sleep data from sources like Oura rings; and more. There are also collaborations that mix data pools to create broader AI applications.

“Let’s say users have Spotify data, Reddit data, and fashion data,” Kazlauskas explains. “Often, Spotify isn’t going to collaborate with those sorts of corporations, and there’s actually regulation against that. But users can do it in the event that they grant access, so these cross-platform datasets might be used to create really powerful models.”

Vana has over 1 million users and over 20 live data DAOs. Greater than 300 additional data pools have been proposed by users on Vana’s system, and Kazlauskas says many will go into production this 12 months.

“I believe there’s numerous promise in generalized AI models, personalized medicine, and latest consumer applications, since it’s tough to mix all that data or get access to it in the primary place,” Kazlauskas says.

The information pools are allowing groups of users to perform something even essentially the most powerful tech corporations struggle with today.

“Today, big tech corporations have built these data moats, so one of the best datasets aren’t available to anyone,” Kazlauskas says. “It’s a collective motion problem, where my data by itself isn’t that precious, but a knowledge pool with tens of 1000’s or hundreds of thousands of individuals is absolutely precious. Vana allows those pools to be built. It’s a win-win: Users get to learn from the rise of AI because they own the models. You then don’t find yourself in scenario where you don’t have a single company controlling an all-powerful AI model. You improve technology, but everyone advantages.”

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x