In this text, we concentrate on the privacy risks of huge language models (LLMs), with respect to their scaled deployment in enterprises.
We also see a growing (and worrisome) trend where enterprises are applying the privacy frameworks and controls that they’d designed for his or her data science / predictive analytics pipelines — as-is to Gen AI / LLM use-cases.
That is clearly inefficient (and dangerous) and we want to adapt the enterprise privacy frameworks, checklists and tooling — to have in mind the novel and differentiating privacy facets of LLMs.
Allow us to first consider the privacy attack scenarios in a conventional supervised ML context [1, 2]. This consists of nearly all of AI/ML world today with mostly machine learning (ML) / deep learning (DL) models developed with the goal of solving a prediction or classification task.
There are mainly two broad categories of inference attacks: membership inference and property…