
On Thursday, Google announced that “commercially motivated” actors have attempted to clone knowledge from its Gemini AI chatbot by simply prompting it. One adversarial session reportedly prompted the model greater than 100,000 times across various non-English languages, collecting responses ostensibly to coach a less expensive copycat.
Google published the findings in what amounts to a quarterly self-assessment of threats to its own products that frames the corporate because the victim and the hero, which is just not unusual in these self-authored assessments. Google calls the illicit activity “model extraction” and considers it mental property theft, which is a somewhat loaded position, given that Google’s LLM was built from materials scraped from the Web without permission.
Google can also be no stranger to the copycat practice. In 2023, The Information reported that Google’s Bard team had been accused of using ChatGPT outputs from ShareGPT, a public site where users share chatbot conversations, to assist train its own chatbot. Senior Google AI researcher Jacob Devlin, who created the influential BERT language model, warned leadership that this violated OpenAI’s terms of service, then resigned and joined OpenAI. Google denied the claim but reportedly stopped using the information.
Even so, Google’s terms of service forbid people from extracting data from its AI models this manner, and the report is a window into the world of somewhat shady AI model-cloning tactics. The corporate believes the culprits are mostly private corporations and researchers in search of a competitive edge, and said the attacks have come from world wide. Google declined to call suspects.
The take care of distillation
Typically, the industry calls this practice of coaching a brand new model on a previous model’s outputs “distillation,” and it really works like this: If you must construct your individual large language model (LLM) but lack the billions of dollars and years of labor that Google spent training Gemini, you should use a previously trained LLM as a shortcut.
