OpenAI’s hunger for data is coming back to bite it


In AI development, the dominant paradigm is that the more training data, the higher. OpenAI’s GPT-2 model had a knowledge set consisting of 40 gigabytes of text. GPT-3, which ChatGPT relies on, was trained on 570 GB of information. OpenAI has not shared how big the info set for its latest model, GPT-4, is. 

But that hunger for larger models is now coming back to bite the corporate. Previously few weeks, several Western data protection authorities have began investigations into how OpenAI collects and processes the info powering ChatGPT. They imagine it has scraped people’s personal data, resembling names or email addresses, and used it without their consent. 

The Italian authority has blocked using ChatGPT as a precautionary measure, and French, German, Irish, and Canadian data regulators are also investigating how the OpenAI system collects and uses data. The European Data Protection Board, the umbrella organization for data protection authorities, can be organising an EU-wide task force to coordinate investigations and enforcement around ChatGPT. 

Italy has given OpenAI until April 30 to comply with the law. This is able to mean OpenAI would should ask people for consent to have their data scraped, or prove that it has a “legitimate interest” in collecting it. OpenAI may even have to elucidate to people how ChatGPT uses their data and provides them the facility to correct any mistakes about them that the chatbot spits out, to have their data erased in the event that they want, and to object to letting the pc program use it. 

If OpenAI cannot persuade the authorities its data use practices are legal, it could possibly be banned in specific countries and even your entire European Union. It could also face hefty fines and might even be forced to delete models and the info used to coach them, says Alexis Leautier, an AI expert on the French data protection agency CNIL.

OpenAI’s violations are so flagrant that it’s likely that this case will find yourself within the Court of Justice of the European Union, the EU’s highest court, says Lilian Edwards, a web law professor at Newcastle University. It could take years before we see a solution to the questions posed by the Italian data regulator. 

High-stakes game

The stakes couldn’t be higher for OpenAI. The EU’s General Data Protection Regulation is the world’s strictest data protection regime, and it has been copied widely all over the world. Regulators in all places from Brazil to California might be paying close attention to what happens next, and the end result could fundamentally change the way in which AI corporations go about collecting data. 

Along with being more transparent about its data practices, OpenAI can have to point out it’s using one in all two possible legal ways to gather training data for its algorithms: consent or “legitimate interest.” 


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x